[Linux-HA] crmd taking up 99% cput on 1st machine to join cluster

Matt Wilder grewaru at gmail.com
Mon Dec 18 10:12:06 MST 2006


Andrew,

>From the oprofile homepage:

"OProfile is a system-wide profiler for Linux systems, capable of
profiling all running code at low overhead. OProfile is released under
the GNU GPL."

This is a linux application that includes a kernel driver.  I
seriously doubt I will be able to subsequently get it running under
FreeBSD.

Are there any alternative applications that are supported under
FreeBSD that do the same thing?


It consists of a kernel driver and a daemon for collecting sample
data, and several post-profiling tools for turning data into
information.

On 12/18/06, Andrew Beekhof <beekhof at gmail.com> wrote:
> On 12/12/06, Matt Wilder <grewaru at gmail.com> wrote:
> > Greetings,
> >
> > I am having a problem with heartbeat 2.0.7.  If i have more than 9
> > primitives in my resource group, the first node to join the cluster
> > will have its crmd take up 99% (or more) of my cpu once the second
> > node joins the cluster.
>
> I've had some good results with oprofile in the last day or so...
> If possible, can you install it on the node thats going to be at 99%
> and start profiling just before you start the second node.
>
> This is the config I've been running with:
> root at c001n03 ~ # cat .oprofile/daemonrc
> CHOSEN_EVENTS[0]=GLOBAL_POWER_EVENTS:100000:1:0:1
> NR_CHOSEN=1
> SEPARATE_LIB=1
> SEPARATE_KERNEL=0
> SEPARATE_THREAD=0
> SEPARATE_CPU=0
> VMLINUX=/boot/vmlinux
> IMAGE_FILTER=/usr/lib/heartbeat/crmd,/usr/lib/heartbeat/cib,/usr/lib/heartbeat/tengine,/usr/lib/heartbeat/pengine
> CPU_BUF_SIZE=0
> CALLGRAPH=1
>
> after that, you just run:
> opcontrol --init
> opcontrol --start
>  your test here
> opcontrol --stop
> opreport /usr/lib/heartbeat/crmd -lg > 99percent.list
> opreport /usr/lib/heartbeat/crmd -clg > 99percent.graph
>
> if you could then send me 99percent.* , i'm pretty sure the problem
> will be obvious
>
> >  I have 2 nodes in this cluster. If i stop heartbeat on both and start
> > heartbeat on the 1st node, then the second node, crmd will go to 99%
> > on the first node once the second has fully joined.  If i start them
> > in reverse order (2 and then 1) the secondary node's crmd will go to
> > 99%.  It does not matter which node is serving the services.  I set my
> > services to not be managed to verify this.
> >
> > Furthermore, this problem only presents itself once i have more than 9
> > primitives in my resource group.  All of my resources on this cluster
> > are part of the same group, as they all have to be ran on the same
> > node.  I have tried numerous combinations of primitives, with and
> > without monitoring.  No combination seems to matter whatsoever.
> > Provided I have 9 or less primitives in the group, crmd runs fine.  If
> > i have 10 or more, crmd maxes the cpu as described above.
> >
> > This is a freebsd 64bit system:
> >
> > FreeBSD glider1.domainit.com 6.1-RELEASE-p3 FreeBSD 6.1-RELEASE-p3 #0:
> > Tue Jul 11 15:40:30 EDT 2006
> > root at gliderweb1.domainit.com:/usr/obj/usr/src/sys/GLIDERWEB1  amd64
> >
> > I would appreciate any thoughts on this.  I can provide more details
> > regarding configuration if necessary.
> >
> > Thanks.
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list