[Linux-HA] MCP died (signal 24: XCPU)
Alan Robertson
alanr at unix.sh
Wed Jun 21 19:02:35 MDT 2006
Dejan Muhamedagic wrote:
> Hi,
>
> This is the backtrace (SLES9/x86_64):
>
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
>
> Core was generated by `heartbeat: ma'.
> Program terminated with signal 24, CPU time limit exceeded.
> ...
> (gdb) bt
> #0 0x0000002a9647911f in poll () from /lib64/tls/libc.so.6
> #1 0x0000002a9588d205 in socket_resume_io_read (ch=0x607cb8,
> nbytes=0x7fbfffd548, read1anyway=0) at ipcsocket.c:1384
> #2 0x0000002a9588d659 in socket_resume_io (ch=0x607cb8) at ipcsocket.c:1581
> #3 0x0000002a9588cdad in socket_is_message_pending (ch=0x607cb8)
> at ipcsocket.c:1192
> #4 0x0000002a95887c96 in G_CH_prepare_int (source=0x606f88,
> timeout=0x7fbfffd604) at GSource.c:516
> #5 0x0000002a95bd9e05 in g_main_context_prepare ()
> from /opt/gnome/lib64/libglib-2.0.so.0
> #6 0x0000002a95bda749 in g_main_context_iterate ()
> from /opt/gnome/lib64/libglib-2.0.so.0
> #7 0x0000002a95bdac8d in g_main_loop_run ()
> from /opt/gnome/lib64/libglib-2.0.so.0
> #8 0x000000000040ab15 in master_control_process () at heartbeat.c:1535
> #9 0x0000000000409c47 in initialize_heartbeat () at heartbeat.c:990
> #10 0x00000000004107b8 in main (argc=2, argv=0x7fbfffdb78, envp=0x7fbfffdb90)
> at heartbeat.c:4842
>
> Strange thing: CPU time limit exceeded, because the ulimit -t for
> root says that it's unlimited. The cib which I got from the
> remaining node with cibadmin -Q and the log are attached. BTW,
> there was nothing else happening at this point in time.
You have to be running with debug on. Heartbeat limits its CPU
consumption and then periodically extends it to keep from hitting the
limit. See cl_cpu_limit_setpercent() and friends.
The master control process is limited to consuming 15 CPU seconds per 30
elapsed seconds.
The reason why we limit CPU consumption when debug is enabled is because
it's SOOO awful to deal with bugs involving infinite loops when your
process runs with realtime priority. NOTHING else on the system ever
runs again :-(.
I've just raised the limit to 70% in CVS. But, _something_ was making
it consume 15 seconds in the last 30 seconds.
Would you have any idea what?
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
More information about the Linux-HA
mailing list