[Linux-HA] Re: memory leaks of crmd and tengine in 2.0.8
Andrew Beekhof
beekhof at gmail.com
Mon Feb 5 13:40:37 MST 2007
Hi Pavol,
Sorry for the delay, I'm not ignoring you, I've just been busy elsewhere.
If I'm reading your data correctly, attrd and crmd seem to be the
worst offenders with tengine a bit behind. I'd not realized the
numbers were so extreme :-(
if you look in lib/cl_plumbing/cl_malloc.c there are a number of
#defines that may help tracking this down. i will start tackling this
tomorrow (starting with attrd given its low complexity).
On 2/4/07, Pavol Gono <palo.gono at gmail.com> wrote:
> Hi
>
> I started another type of testing - simulation of disconnecting cables
> with iptables. Failovers between nodes are triggered by blocking ICMP
> responses from ping nodes (see script.txt).
>
> There are another two leaking processes:
> attrd eats 396 KB per while loop
> ccm displays following type of messages sometimes
> ccm: [27757]: WARN: leaking memory? previous arena=3108864 present arena=3244032
> (very small memory increase)
>
> Configuration is similar to previous post, only Dummy resource is
> replaced by custom one.
>
> For my tests it is annoying that heartbeat eats hundreds of megabytes
> after some hours/days. Can I help you to make fixes sooner?
> What are the best configure switches for memory leak detection
> (--enable-dmalloc/--enable-crm-dev/--enable-crm-dmalloc/--enable-crm-force-malloc)?
> Is it better to make up simple testcases (less resources, less
> operations) or the complex testcase, which contains all possible
> memory leaks?
> Should I use latest dev sources or latest stable sources?
> (I would like to have fixes against 2.0.8 currently)
>
> The output of script for node sk16251c:
> PID VIRT RES DATA SHR %MEM TIME+ S COMMAND
> Fri Feb 2 18:26:13 CET 2007 - sk16251c
> 27708 2944 1056 396 744 0.2 0:00.88 S ha_logd: read process
> 27713 2812 864 264 620 0.2 0:00.87 S ha_logd: write process
> 27756 2976 1284 264 1084 0.3 0:00.01 S
> /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> 27757 3356 1368 704 1104 0.3 0:00.01 S /usr/local/lib/heartbeat/ccm
> 27758 4452 2308 1356 1388 0.5 0:10.35 S /usr/local/lib/heartbeat/cib
> 27759 3168 1488 396 1136 0.3 0:00.25 S /usr/local/lib/heartbeat/lrmd -r
> 27760 3060 3060 392 2572 0.6 0:00.00 S /usr/local/lib/heartbeat/stonithd
> 27761 3968 2316 1188 1164 0.5 0:00.19 S /usr/local/lib/heartbeat/attrd
> 27762 5500 3416 2192 1680 0.7 0:01.50 S /usr/local/lib/heartbeat/crmd
> 27769 3660 1896 788 1196 0.4 0:00.46 S /usr/local/lib/heartbeat/tengine
> 27770 4404 2596 1132 1416 0.5 0:02.39 S /usr/local/lib/heartbeat/pengine
> ...
> Sat Feb 3 01:54:47 CET 2007 - sk16251c
> 27708 2944 1076 396 744 0.2 0:52.57 S ha_logd: read process
> 27713 2812 876 264 620 0.2 0:43.66 S ha_logd: write process
> 27756 2976 1284 264 1084 0.3 0:00.16 S
> /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> 27757 4016 2056 1364 1104 0.4 0:00.26 S /usr/local/lib/heartbeat/ccm
> 27758 4452 2352 1356 1404 0.5 9:34.45 S /usr/local/lib/heartbeat/cib
> 27759 3168 1500 396 1140 0.3 0:08.75 S /usr/local/lib/heartbeat/lrmd -r
> 27760 3060 3060 392 2572 0.6 0:00.29 S /usr/local/lib/heartbeat/stonithd
> 27761 69440 66m 65m 1164 13.4 0:17.46 S /usr/local/lib/heartbeat/attrd
> 27762 34540 31m 30m 1680 6.4 1:41.60 S /usr/local/lib/heartbeat/crmd
> 27769 3660 1900 788 1200 0.4 0:18.97 S /usr/local/lib/heartbeat/tengine
> 27770 4980 3148 1708 1416 0.6 2:31.45 S /usr/local/lib/heartbeat/pengine
>
>
> Palo
>
>
> On 1/29/07, Pavol Gono <palo.gono at gmail.com> wrote:
> > Hi
> >
> > I found memory leaks of described processes when doing following failovers:
> > deboserver -> pgbook: with crm_standby
> > pgbook -> deboserver: failing monitor operation of resource Dummy
> > Frequency is 2 failovers per minute. Script and configuration attached.
> > Memory leaks of crmd are the most markant: 132 KB per failover.
> > pengine displays the "Potential memory leak detected" messages, and
> > shall be fixed in upstream already.
> >
> > Output:
> > PID USER VIRT RES DATA SHR %MEM TIME+ S COMMAND
> > Mon Jan 29 15:30:47 CET 2007
> > 3437 hacluste 6152 2844 1492 1816 0.6 0:00.18 S crmd
> > 3443 hacluste 5020 2084 796 1340 0.4 0:00.08 S tengine
> > 3444 hacluste 5560 2564 940 1548 0.5 0:00.10 S pengine
> > Mon Jan 29 15:31:13 CET 2007
> > 3437 hacluste 6304 2980 1644 1820 0.6 0:00.36 S crmd
> > 3443 hacluste 5020 2104 796 1352 0.4 0:00.15 S tengine
> > 3444 hacluste 5768 2724 1148 1552 0.5 0:00.35 S pengine
> > ...
> > Mon Jan 29 15:34:17 CET 2007
> > 3437 hacluste 7360 4096 2700 1820 0.8 0:01.63 S crmd
> > 3443 hacluste 5152 2272 928 1352 0.4 0:00.61 S tengine
> > 3444 hacluste 5768 2760 1148 1552 0.5 0:02.31 S pengine
> > ...
> > Mon Jan 29 15:48:19 CET 2007
> > 3437 hacluste 12376 9084 7716 1820 1.8 0:07.75 S crmd
> > 3443 hacluste 6472 3604 2248 1352 0.7 0:02.76 S tengine
> > 3444 hacluste 5768 2804 1148 1552 0.5 0:11.46 S pengine
> > Mon Jan 29 15:48:46 CET 2007
> > 3437 hacluste 12508 9240 7848 1820 1.8 0:07.92 S crmd
> > 3443 hacluste 6472 3648 2248 1352 0.7 0:02.81 S tengine
> > 3444 hacluste 5840 2808 1220 1552 0.5 0:11.73 S pengine
> > ...
> > Mon Jan 29 16:16:26 CET 2007
> > 3437 hacluste 22276 18m 17m 1820 3.7 0:19.82 S crmd
> > 3443 hacluste 9244 6324 5020 1352 1.2 0:07.04 S tengine
> > 3444 hacluste 5912 2888 1292 1552 0.6 0:29.18 S pengine
> >
> >
> > I used stable 2.0.8 sources with minor modifications from upstream
> > (see attached patch).
> >
> > Palo
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
More information about the Linux-HA
mailing list