[Linux-HA] Re: memory leaks of crmd and tengine in 2.0.8

Andrew Beekhof beekhof at gmail.com
Mon Feb 5 13:40:37 MST 2007


Hi Pavol,

Sorry for the delay, I'm not ignoring you, I've just been busy elsewhere.

If I'm reading your data correctly, attrd and crmd seem to be the
worst offenders with tengine a bit behind.  I'd not realized the
numbers were so extreme :-(

if you look in lib/cl_plumbing/cl_malloc.c there are a number of
#defines that may help tracking this down.  i will start tackling this
tomorrow (starting with attrd given its low complexity).

On 2/4/07, Pavol Gono <palo.gono at gmail.com> wrote:
> Hi
>
> I started another type of testing - simulation of disconnecting cables
> with iptables. Failovers between nodes are triggered by blocking ICMP
> responses from ping nodes (see script.txt).
>
> There are another two leaking processes:
> attrd eats 396 KB per while loop
> ccm displays following type of messages sometimes
> ccm: [27757]: WARN: leaking memory? previous arena=3108864 present arena=3244032
> (very small memory increase)
>
> Configuration is similar to previous post, only Dummy resource is
> replaced by custom one.
>
> For my tests it is annoying that heartbeat eats hundreds of megabytes
> after some hours/days. Can I help you to make fixes sooner?
> What are the best configure switches for memory leak detection
> (--enable-dmalloc/--enable-crm-dev/--enable-crm-dmalloc/--enable-crm-force-malloc)?
> Is it better to make up simple testcases (less resources, less
> operations) or the complex testcase, which contains all possible
> memory leaks?
> Should I use latest dev sources or latest stable sources?
> (I would like to have fixes against 2.0.8 currently)
>
> The output of script for node sk16251c:
>   PID  VIRT  RES DATA  SHR %MEM    TIME+  S COMMAND
> Fri Feb  2 18:26:13 CET 2007 - sk16251c
> 27708  2944 1056  396  744  0.2   0:00.88 S ha_logd: read process
> 27713  2812  864  264  620  0.2   0:00.87 S ha_logd: write process
> 27756  2976 1284  264 1084  0.3   0:00.01 S
> /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> 27757  3356 1368  704 1104  0.3   0:00.01 S /usr/local/lib/heartbeat/ccm
> 27758  4452 2308 1356 1388  0.5   0:10.35 S /usr/local/lib/heartbeat/cib
> 27759  3168 1488  396 1136  0.3   0:00.25 S /usr/local/lib/heartbeat/lrmd -r
> 27760  3060 3060  392 2572  0.6   0:00.00 S /usr/local/lib/heartbeat/stonithd
> 27761  3968 2316 1188 1164  0.5   0:00.19 S /usr/local/lib/heartbeat/attrd
> 27762  5500 3416 2192 1680  0.7   0:01.50 S /usr/local/lib/heartbeat/crmd
> 27769  3660 1896  788 1196  0.4   0:00.46 S /usr/local/lib/heartbeat/tengine
> 27770  4404 2596 1132 1416  0.5   0:02.39 S /usr/local/lib/heartbeat/pengine
> ...
> Sat Feb  3 01:54:47 CET 2007 - sk16251c
> 27708  2944 1076  396  744  0.2   0:52.57 S ha_logd: read process
> 27713  2812  876  264  620  0.2   0:43.66 S ha_logd: write process
> 27756  2976 1284  264 1084  0.3   0:00.16 S
> /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> 27757  4016 2056 1364 1104  0.4   0:00.26 S /usr/local/lib/heartbeat/ccm
> 27758  4452 2352 1356 1404  0.5   9:34.45 S /usr/local/lib/heartbeat/cib
> 27759  3168 1500  396 1140  0.3   0:08.75 S /usr/local/lib/heartbeat/lrmd -r
> 27760  3060 3060  392 2572  0.6   0:00.29 S /usr/local/lib/heartbeat/stonithd
> 27761 69440  66m  65m 1164 13.4   0:17.46 S /usr/local/lib/heartbeat/attrd
> 27762 34540  31m  30m 1680  6.4   1:41.60 S /usr/local/lib/heartbeat/crmd
> 27769  3660 1900  788 1200  0.4   0:18.97 S /usr/local/lib/heartbeat/tengine
> 27770  4980 3148 1708 1416  0.6   2:31.45 S /usr/local/lib/heartbeat/pengine
>
>
> Palo
>
>
> On 1/29/07, Pavol Gono <palo.gono at gmail.com> wrote:
> > Hi
> >
> > I found memory leaks of described processes when doing following failovers:
> > deboserver -> pgbook: with crm_standby
> > pgbook -> deboserver: failing monitor operation of resource Dummy
> > Frequency is 2 failovers per minute. Script and configuration attached.
> > Memory leaks of crmd are the most markant: 132 KB per failover.
> > pengine displays the "Potential memory leak detected" messages, and
> > shall be fixed in upstream already.
> >
> > Output:
> >   PID USER      VIRT  RES DATA  SHR %MEM    TIME+  S COMMAND
> > Mon Jan 29 15:30:47 CET 2007
> >  3437 hacluste  6152 2844 1492 1816  0.6   0:00.18 S crmd
> >  3443 hacluste  5020 2084  796 1340  0.4   0:00.08 S tengine
> >  3444 hacluste  5560 2564  940 1548  0.5   0:00.10 S pengine
> > Mon Jan 29 15:31:13 CET 2007
> >  3437 hacluste  6304 2980 1644 1820  0.6   0:00.36 S crmd
> >  3443 hacluste  5020 2104  796 1352  0.4   0:00.15 S tengine
> >  3444 hacluste  5768 2724 1148 1552  0.5   0:00.35 S pengine
> > ...
> > Mon Jan 29 15:34:17 CET 2007
> >  3437 hacluste  7360 4096 2700 1820  0.8   0:01.63 S crmd
> >  3443 hacluste  5152 2272  928 1352  0.4   0:00.61 S tengine
> >  3444 hacluste  5768 2760 1148 1552  0.5   0:02.31 S pengine
> > ...
> > Mon Jan 29 15:48:19 CET 2007
> >  3437 hacluste 12376 9084 7716 1820  1.8   0:07.75 S crmd
> >  3443 hacluste  6472 3604 2248 1352  0.7   0:02.76 S tengine
> >  3444 hacluste  5768 2804 1148 1552  0.5   0:11.46 S pengine
> > Mon Jan 29 15:48:46 CET 2007
> >  3437 hacluste 12508 9240 7848 1820  1.8   0:07.92 S crmd
> >  3443 hacluste  6472 3648 2248 1352  0.7   0:02.81 S tengine
> >  3444 hacluste  5840 2808 1220 1552  0.5   0:11.73 S pengine
> > ...
> > Mon Jan 29 16:16:26 CET 2007
> >  3437 hacluste 22276  18m  17m 1820  3.7   0:19.82 S crmd
> >  3443 hacluste  9244 6324 5020 1352  1.2   0:07.04 S tengine
> >  3444 hacluste  5912 2888 1292 1552  0.6   0:29.18 S pengine
> >
> >
> > I used stable 2.0.8 sources with minor modifications from upstream
> > (see attached patch).
> >
> > Palo
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>


More information about the Linux-HA mailing list