[Linux-HA] Re: memory leaks of crmd and tengine in 2.0.8
Andrew Beekhof
beekhof at gmail.com
Tue Feb 6 08:27:16 MST 2007
On 2/5/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> Hi Pavol,
>
> Sorry for the delay, I'm not ignoring you, I've just been busy elsewhere.
>
> If I'm reading your data correctly, attrd and crmd seem to be the
> worst offenders with tengine a bit behind. I'd not realized the
> numbers were so extreme :-(
>
> if you look in lib/cl_plumbing/cl_malloc.c there are a number of
> #defines that may help tracking this down. i will start tackling this
> tomorrow (starting with attrd given its low complexity).
If you're inclined, you could rerun your tests to verify my attrd changes.
Patches are (in order):
* http://hg.linux-ha.org/dev/rev/30b947bd77e5
* http://hg.linux-ha.org/dev/rev/8ff8ca1f9294
* http://hg.linux-ha.org/dev/rev/5cc8305990e2
> On 2/4/07, Pavol Gono <palo.gono at gmail.com> wrote:
> > Hi
> >
> > I started another type of testing - simulation of disconnecting cables
> > with iptables. Failovers between nodes are triggered by blocking ICMP
> > responses from ping nodes (see script.txt).
> >
> > There are another two leaking processes:
> > attrd eats 396 KB per while loop
> > ccm displays following type of messages sometimes
> > ccm: [27757]: WARN: leaking memory? previous arena=3108864 present arena=3244032
> > (very small memory increase)
> >
> > Configuration is similar to previous post, only Dummy resource is
> > replaced by custom one.
> >
> > For my tests it is annoying that heartbeat eats hundreds of megabytes
> > after some hours/days. Can I help you to make fixes sooner?
> > What are the best configure switches for memory leak detection
> > (--enable-dmalloc/--enable-crm-dev/--enable-crm-dmalloc/--enable-crm-force-malloc)?
> > Is it better to make up simple testcases (less resources, less
> > operations) or the complex testcase, which contains all possible
> > memory leaks?
> > Should I use latest dev sources or latest stable sources?
> > (I would like to have fixes against 2.0.8 currently)
> >
> > The output of script for node sk16251c:
> > PID VIRT RES DATA SHR %MEM TIME+ S COMMAND
> > Fri Feb 2 18:26:13 CET 2007 - sk16251c
> > 27708 2944 1056 396 744 0.2 0:00.88 S ha_logd: read process
> > 27713 2812 864 264 620 0.2 0:00.87 S ha_logd: write process
> > 27756 2976 1284 264 1084 0.3 0:00.01 S
> > /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> > 27757 3356 1368 704 1104 0.3 0:00.01 S /usr/local/lib/heartbeat/ccm
> > 27758 4452 2308 1356 1388 0.5 0:10.35 S /usr/local/lib/heartbeat/cib
> > 27759 3168 1488 396 1136 0.3 0:00.25 S /usr/local/lib/heartbeat/lrmd -r
> > 27760 3060 3060 392 2572 0.6 0:00.00 S /usr/local/lib/heartbeat/stonithd
> > 27761 3968 2316 1188 1164 0.5 0:00.19 S /usr/local/lib/heartbeat/attrd
> > 27762 5500 3416 2192 1680 0.7 0:01.50 S /usr/local/lib/heartbeat/crmd
> > 27769 3660 1896 788 1196 0.4 0:00.46 S /usr/local/lib/heartbeat/tengine
> > 27770 4404 2596 1132 1416 0.5 0:02.39 S /usr/local/lib/heartbeat/pengine
> > ...
> > Sat Feb 3 01:54:47 CET 2007 - sk16251c
> > 27708 2944 1076 396 744 0.2 0:52.57 S ha_logd: read process
> > 27713 2812 876 264 620 0.2 0:43.66 S ha_logd: write process
> > 27756 2976 1284 264 1084 0.3 0:00.16 S
> > /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> > 27757 4016 2056 1364 1104 0.4 0:00.26 S /usr/local/lib/heartbeat/ccm
> > 27758 4452 2352 1356 1404 0.5 9:34.45 S /usr/local/lib/heartbeat/cib
> > 27759 3168 1500 396 1140 0.3 0:08.75 S /usr/local/lib/heartbeat/lrmd -r
> > 27760 3060 3060 392 2572 0.6 0:00.29 S /usr/local/lib/heartbeat/stonithd
> > 27761 69440 66m 65m 1164 13.4 0:17.46 S /usr/local/lib/heartbeat/attrd
> > 27762 34540 31m 30m 1680 6.4 1:41.60 S /usr/local/lib/heartbeat/crmd
> > 27769 3660 1900 788 1200 0.4 0:18.97 S /usr/local/lib/heartbeat/tengine
> > 27770 4980 3148 1708 1416 0.6 2:31.45 S /usr/local/lib/heartbeat/pengine
> >
> >
> > Palo
> >
> >
> > On 1/29/07, Pavol Gono <palo.gono at gmail.com> wrote:
> > > Hi
> > >
> > > I found memory leaks of described processes when doing following failovers:
> > > deboserver -> pgbook: with crm_standby
> > > pgbook -> deboserver: failing monitor operation of resource Dummy
> > > Frequency is 2 failovers per minute. Script and configuration attached.
> > > Memory leaks of crmd are the most markant: 132 KB per failover.
> > > pengine displays the "Potential memory leak detected" messages, and
> > > shall be fixed in upstream already.
> > >
> > > Output:
> > > PID USER VIRT RES DATA SHR %MEM TIME+ S COMMAND
> > > Mon Jan 29 15:30:47 CET 2007
> > > 3437 hacluste 6152 2844 1492 1816 0.6 0:00.18 S crmd
> > > 3443 hacluste 5020 2084 796 1340 0.4 0:00.08 S tengine
> > > 3444 hacluste 5560 2564 940 1548 0.5 0:00.10 S pengine
> > > Mon Jan 29 15:31:13 CET 2007
> > > 3437 hacluste 6304 2980 1644 1820 0.6 0:00.36 S crmd
> > > 3443 hacluste 5020 2104 796 1352 0.4 0:00.15 S tengine
> > > 3444 hacluste 5768 2724 1148 1552 0.5 0:00.35 S pengine
> > > ...
> > > Mon Jan 29 15:34:17 CET 2007
> > > 3437 hacluste 7360 4096 2700 1820 0.8 0:01.63 S crmd
> > > 3443 hacluste 5152 2272 928 1352 0.4 0:00.61 S tengine
> > > 3444 hacluste 5768 2760 1148 1552 0.5 0:02.31 S pengine
> > > ...
> > > Mon Jan 29 15:48:19 CET 2007
> > > 3437 hacluste 12376 9084 7716 1820 1.8 0:07.75 S crmd
> > > 3443 hacluste 6472 3604 2248 1352 0.7 0:02.76 S tengine
> > > 3444 hacluste 5768 2804 1148 1552 0.5 0:11.46 S pengine
> > > Mon Jan 29 15:48:46 CET 2007
> > > 3437 hacluste 12508 9240 7848 1820 1.8 0:07.92 S crmd
> > > 3443 hacluste 6472 3648 2248 1352 0.7 0:02.81 S tengine
> > > 3444 hacluste 5840 2808 1220 1552 0.5 0:11.73 S pengine
> > > ...
> > > Mon Jan 29 16:16:26 CET 2007
> > > 3437 hacluste 22276 18m 17m 1820 3.7 0:19.82 S crmd
> > > 3443 hacluste 9244 6324 5020 1352 1.2 0:07.04 S tengine
> > > 3444 hacluste 5912 2888 1292 1552 0.6 0:29.18 S pengine
> > >
> > >
> > > I used stable 2.0.8 sources with minor modifications from upstream
> > > (see attached patch).
> > >
> > > Palo
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
>
More information about the Linux-HA
mailing list