[Linux-HA] Re: memory leaks of crmd and tengine in 2.0.8

Andrew Beekhof beekhof at gmail.com
Tue Feb 6 08:27:16 MST 2007


On 2/5/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> Hi Pavol,
>
> Sorry for the delay, I'm not ignoring you, I've just been busy elsewhere.
>
> If I'm reading your data correctly, attrd and crmd seem to be the
> worst offenders with tengine a bit behind.  I'd not realized the
> numbers were so extreme :-(
>
> if you look in lib/cl_plumbing/cl_malloc.c there are a number of
> #defines that may help tracking this down.  i will start tackling this
> tomorrow (starting with attrd given its low complexity).

If you're inclined, you could rerun your tests to verify my attrd changes.

Patches are (in order):
* http://hg.linux-ha.org/dev/rev/30b947bd77e5
* http://hg.linux-ha.org/dev/rev/8ff8ca1f9294
* http://hg.linux-ha.org/dev/rev/5cc8305990e2

> On 2/4/07, Pavol Gono <palo.gono at gmail.com> wrote:
> > Hi
> >
> > I started another type of testing - simulation of disconnecting cables
> > with iptables. Failovers between nodes are triggered by blocking ICMP
> > responses from ping nodes (see script.txt).
> >
> > There are another two leaking processes:
> > attrd eats 396 KB per while loop
> > ccm displays following type of messages sometimes
> > ccm: [27757]: WARN: leaking memory? previous arena=3108864 present arena=3244032
> > (very small memory increase)
> >
> > Configuration is similar to previous post, only Dummy resource is
> > replaced by custom one.
> >
> > For my tests it is annoying that heartbeat eats hundreds of megabytes
> > after some hours/days. Can I help you to make fixes sooner?
> > What are the best configure switches for memory leak detection
> > (--enable-dmalloc/--enable-crm-dev/--enable-crm-dmalloc/--enable-crm-force-malloc)?
> > Is it better to make up simple testcases (less resources, less
> > operations) or the complex testcase, which contains all possible
> > memory leaks?
> > Should I use latest dev sources or latest stable sources?
> > (I would like to have fixes against 2.0.8 currently)
> >
> > The output of script for node sk16251c:
> >   PID  VIRT  RES DATA  SHR %MEM    TIME+  S COMMAND
> > Fri Feb  2 18:26:13 CET 2007 - sk16251c
> > 27708  2944 1056  396  744  0.2   0:00.88 S ha_logd: read process
> > 27713  2812  864  264  620  0.2   0:00.87 S ha_logd: write process
> > 27756  2976 1284  264 1084  0.3   0:00.01 S
> > /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> > 27757  3356 1368  704 1104  0.3   0:00.01 S /usr/local/lib/heartbeat/ccm
> > 27758  4452 2308 1356 1388  0.5   0:10.35 S /usr/local/lib/heartbeat/cib
> > 27759  3168 1488  396 1136  0.3   0:00.25 S /usr/local/lib/heartbeat/lrmd -r
> > 27760  3060 3060  392 2572  0.6   0:00.00 S /usr/local/lib/heartbeat/stonithd
> > 27761  3968 2316 1188 1164  0.5   0:00.19 S /usr/local/lib/heartbeat/attrd
> > 27762  5500 3416 2192 1680  0.7   0:01.50 S /usr/local/lib/heartbeat/crmd
> > 27769  3660 1896  788 1196  0.4   0:00.46 S /usr/local/lib/heartbeat/tengine
> > 27770  4404 2596 1132 1416  0.5   0:02.39 S /usr/local/lib/heartbeat/pengine
> > ...
> > Sat Feb  3 01:54:47 CET 2007 - sk16251c
> > 27708  2944 1076  396  744  0.2   0:52.57 S ha_logd: read process
> > 27713  2812  876  264  620  0.2   0:43.66 S ha_logd: write process
> > 27756  2976 1284  264 1084  0.3   0:00.16 S
> > /usr/local/lib/heartbeat/pingd -m 10 -d 5s
> > 27757  4016 2056 1364 1104  0.4   0:00.26 S /usr/local/lib/heartbeat/ccm
> > 27758  4452 2352 1356 1404  0.5   9:34.45 S /usr/local/lib/heartbeat/cib
> > 27759  3168 1500  396 1140  0.3   0:08.75 S /usr/local/lib/heartbeat/lrmd -r
> > 27760  3060 3060  392 2572  0.6   0:00.29 S /usr/local/lib/heartbeat/stonithd
> > 27761 69440  66m  65m 1164 13.4   0:17.46 S /usr/local/lib/heartbeat/attrd
> > 27762 34540  31m  30m 1680  6.4   1:41.60 S /usr/local/lib/heartbeat/crmd
> > 27769  3660 1900  788 1200  0.4   0:18.97 S /usr/local/lib/heartbeat/tengine
> > 27770  4980 3148 1708 1416  0.6   2:31.45 S /usr/local/lib/heartbeat/pengine
> >
> >
> > Palo
> >
> >
> > On 1/29/07, Pavol Gono <palo.gono at gmail.com> wrote:
> > > Hi
> > >
> > > I found memory leaks of described processes when doing following failovers:
> > > deboserver -> pgbook: with crm_standby
> > > pgbook -> deboserver: failing monitor operation of resource Dummy
> > > Frequency is 2 failovers per minute. Script and configuration attached.
> > > Memory leaks of crmd are the most markant: 132 KB per failover.
> > > pengine displays the "Potential memory leak detected" messages, and
> > > shall be fixed in upstream already.
> > >
> > > Output:
> > >   PID USER      VIRT  RES DATA  SHR %MEM    TIME+  S COMMAND
> > > Mon Jan 29 15:30:47 CET 2007
> > >  3437 hacluste  6152 2844 1492 1816  0.6   0:00.18 S crmd
> > >  3443 hacluste  5020 2084  796 1340  0.4   0:00.08 S tengine
> > >  3444 hacluste  5560 2564  940 1548  0.5   0:00.10 S pengine
> > > Mon Jan 29 15:31:13 CET 2007
> > >  3437 hacluste  6304 2980 1644 1820  0.6   0:00.36 S crmd
> > >  3443 hacluste  5020 2104  796 1352  0.4   0:00.15 S tengine
> > >  3444 hacluste  5768 2724 1148 1552  0.5   0:00.35 S pengine
> > > ...
> > > Mon Jan 29 15:34:17 CET 2007
> > >  3437 hacluste  7360 4096 2700 1820  0.8   0:01.63 S crmd
> > >  3443 hacluste  5152 2272  928 1352  0.4   0:00.61 S tengine
> > >  3444 hacluste  5768 2760 1148 1552  0.5   0:02.31 S pengine
> > > ...
> > > Mon Jan 29 15:48:19 CET 2007
> > >  3437 hacluste 12376 9084 7716 1820  1.8   0:07.75 S crmd
> > >  3443 hacluste  6472 3604 2248 1352  0.7   0:02.76 S tengine
> > >  3444 hacluste  5768 2804 1148 1552  0.5   0:11.46 S pengine
> > > Mon Jan 29 15:48:46 CET 2007
> > >  3437 hacluste 12508 9240 7848 1820  1.8   0:07.92 S crmd
> > >  3443 hacluste  6472 3648 2248 1352  0.7   0:02.81 S tengine
> > >  3444 hacluste  5840 2808 1220 1552  0.5   0:11.73 S pengine
> > > ...
> > > Mon Jan 29 16:16:26 CET 2007
> > >  3437 hacluste 22276  18m  17m 1820  3.7   0:19.82 S crmd
> > >  3443 hacluste  9244 6324 5020 1352  1.2   0:07.04 S tengine
> > >  3444 hacluste  5912 2888 1292 1552  0.6   0:29.18 S pengine
> > >
> > >
> > > I used stable 2.0.8 sources with minor modifications from upstream
> > > (see attached patch).
> > >
> > > Palo
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
>


More information about the Linux-HA mailing list