[Linux-HA] tengine dies unexpectedly
Matthias Dahl (The Design Assembly GmbH)
mdmlha at designassembly.de
Fri Sep 8 07:37:06 MDT 2006
Hello everyone.
Once again I need a bit of help. Heartbeat (v2, CRM on) is going to be
deployed on a 3 node cluster very soon in our company. I am currently working
through this step by step, meaning... I finish up one node (which includes
virtualization via OpenVZ, ...) before I move to the next one.
Right now I am stuck with the first node which is far from being finished but
basically Heartbeat should become a master (no one else around), start up a
resource (an OpenVZ virtual maschine for which I've written its own OCF
resource agent) and monitor it.
That's the theory. Reality looks a bit different: tengine keeps dying on me
and I cannot figure out why. I have attached all relevant configs and logs.
BTW. I had some trouble with the permissions on some directories. I have yet
to investigate this any further to see if this is a bug with the Gentoo
ebuild or Heartbeat itself. So I include the permissions here too... maybe
those are of interest.
~ # ls -l /var/lib/heartbeat/
total 32
drwxr-xr-x 5 root root 4096 Sep 6 23:22 cores
drwxr-x--- 2 cluster root 4096 Sep 8 14:04 crm
drwxr-x--- 2 root cluster 4096 Sep 6 23:22 fencing
prw------- 1 root root 0 Sep 8 12:14 fifo
-rw-r--r-- 1 root root 16 Sep 8 14:02 hb_generation
-rw-r--r-- 1 root root 16 Sep 8 14:02 hb_uuid
-rw-r--r-- 1 root root 145 Sep 8 14:02 hostcache
drwxrwx--- 2 root cluster 4096 Sep 6 23:22 lrm
drwxr-x--- 2 root root 4096 Sep 6 23:22 pengine
~ # ls -l /var/lib/heartbeat/cores
drwx------ 2 cluster root 4096 Sep 8 14:04 cluster
drwx------ 2 nobody root 4096 Sep 6 23:22 nobody
drwx------ 2 root root 4096 Sep 6 23:22 root
Initially, crm and cores/cluster were root:root. A quick look into Heartbeat's
Makefile turned up that cores/cluster should have been set up right, but I
couldn't find anything that sets the ownership of crm itself.
I'd be more than thankful for any help or hints I get.
Best regards,
Matthias Dahl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib.xml
Type: text/xml
Size: 2390 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20060908/b4fcc0cd/cib-0001.bin
-------------- next part --------------
heartbeat[6973]: 2006/09/08_14:02:43 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[6973]: 2006/09/08_14:02:43 info: **************************
heartbeat[6973]: 2006/09/08_14:02:43 info: Configuration validated. Starting heartbeat 2.0.7
heartbeat[6974]: 2006/09/08_14:02:43 info: heartbeat: version 2.0.7
heartbeat[6974]: 2006/09/08_14:02:43 WARN: No Previous generation - starting at 1
heartbeat[6974]: 2006/09/08_14:02:43 info: Heartbeat generation: 1
heartbeat[6974]: 2006/09/08_14:02:43 info: No uuid found for current node - generating a new uuid.
heartbeat[6974]: 2006/09/08_14:02:43 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[6974]: 2006/09/08_14:02:43 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[6974]: 2006/09/08_14:02:43 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
heartbeat[6974]: 2006/09/08_14:02:43 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[6974]: 2006/09/08_14:02:43 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
heartbeat[6974]: 2006/09/08_14:02:43 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[6974]: 2006/09/08_14:02:43 info: Local status now set to: 'up'
heartbeat[6974]: 2006/09/08_14:03:44 WARN: node mrbig: is dead
heartbeat[6974]: 2006/09/08_14:03:44 WARN: node mrsmall: is dead
heartbeat[6974]: 2006/09/08_14:03:44 info: Comm_now_up(): updating status to active
heartbeat[6974]: 2006/09/08_14:03:44 info: Local status now set to: 'active'
heartbeat[6974]: 2006/09/08_14:03:44 info: Starting child client "/usr/lib/heartbeat/ccm" (65,65)
heartbeat[6974]: 2006/09/08_14:03:44 info: Starting child client "/usr/lib/heartbeat/cib" (65,65)
heartbeat[6974]: 2006/09/08_14:03:44 info: Starting child client "/usr/lib/heartbeat/lrmd" (0,0)
heartbeat[6983]: 2006/09/08_14:03:44 info: Starting "/usr/lib/heartbeat/ccm" as uid 65 gid 65 (pid 6983)
heartbeat[6974]: 2006/09/08_14:03:44 info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
heartbeat[6985]: 2006/09/08_14:03:44 info: Starting "/usr/lib/heartbeat/lrmd" as uid 0 gid 0 (pid 6985)
heartbeat[6984]: 2006/09/08_14:03:44 info: Starting "/usr/lib/heartbeat/cib" as uid 65 gid 65 (pid 6984)
heartbeat[6974]: 2006/09/08_14:03:44 info: Starting child client "/usr/lib/heartbeat/attrd" (65,65)
heartbeat[6974]: 2006/09/08_14:03:44 info: Starting child client "/usr/lib/heartbeat/crmd" (65,65)
heartbeat[6986]: 2006/09/08_14:03:44 info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 6986)
lrmd[6985]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
heartbeat[6987]: 2006/09/08_14:03:44 info: Starting "/usr/lib/heartbeat/attrd" as uid 65 gid 65 (pid 6987)
heartbeat[6988]: 2006/09/08_14:03:44 info: Starting "/usr/lib/heartbeat/crmd" as uid 65 gid 65 (pid 6988)
cib[6984]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
lrmd[6985]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 17
cib[6984]: 2006/09/08_14:03:44 info: G_main_add_TriggerHandler: Added signal manual handler
cib[6984]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 17
lrmd[6985]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 10
lrmd[6985]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 12
cib[6984]: 2006/09/08_14:03:44 info: main:main.c Retrieval of a per-action CIB: disabled
lrmd[6985]: 2006/09/08_14:03:44 info: Started.
cib[6984]: 2006/09/08_14:03:44 info: cib_register_ha:main.c Signing in with Heartbeat
cib[6984]: 2006/09/08_14:03:44 info: cib_register_ha:main.c FSA Hostname: odin
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile:io.c Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
ccm[6983]: 2006/09/08_14:03:44 info: Hostname: odin
stonithd[6986]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 10
stonithd[6986]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 12
cib[6984]: 2006/09/08_14:03:44 WARN: validate_cib_digest:io.c No on-disk digest present
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <cib num_updates="0" epoch="0" admin_epoch="0" cib_feature_revision="1.3" have_quorum="false">
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <configuration>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <crm_config>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <cluster_property_set id="cbg_cluster_config">
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <attributes>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="transition_idle_timeout" name="transition_idle_timeout" value="60s"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="default_action_timeout" name="default_action_timeout" value="3m"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="symmetric_cluster" name="symmetric_cluster" value="false"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="stonith_enabled" name="stonith_enabled" value="false"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="stonith_action" name="stonith_action" value="reboot"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="no_quorum_policy" name="no_quorum_policy" value="stop"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="default_resource_stickiness" name="default_resource_stickiness" value="0"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="default_resource_failure_stickiness" name="default_resource_failure_stickiness" value="0"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="stop_orphan_resources" name="stop_orphan_resources" value="true"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="stop_orphan_actions" name="stop_orphan_actions" value="true"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="remove_after_stop" name="remove_after_stop" value="false"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="is_managed_default" name="is_managed_default" value="true"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </attributes>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </cluster_property_set>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </crm_config>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nodes/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <resources>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <primitive id="res_VE_web1" class="ocf" type="ManageVE" provider="designassembly">
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <operations>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <op id="1" name="monitor" interval="5s" timeout="5s"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </operations>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <instance_attributes id="res_VE_web1_attributes">
stonithd[6986]: 2006/09/08_14:03:44 info: Signing in with heartbeat.
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <attributes>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <nvpair id="res_VE_web1_veid" name="veid" value="100"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </attributes>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </instance_attributes>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </primitive>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </resources>
attrd[6987]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <constraints>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <rsc_location id="con_rsc_loc_res_VE_web1" rsc="res_VE_web1">
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <rule id="con_rsc_local_res_VE_web1_only_odin" score="INFINITY" boolean_op="and">
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <expression id="exp_con_rsc_local_res_VE_web1_only_odin" attribute="#uname" operation="eq" value="odin"/>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </rule>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </rsc_location>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </constraints>
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </configuration>
stonithd[6986]: 2006/09/08_14:03:44 notice: /usr/lib/heartbeat/stonithd start up successfully.
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] <status/>
stonithd[6986]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 17
cib[6984]: 2006/09/08_14:03:44 info: readCibXmlFile: [on-disk] </cib>
crmd[6988]: 2006/09/08_14:03:44 info: init_start:main.c Starting crmd
attrd[6987]: 2006/09/08_14:03:44 info: register_with_ha:attrd.c Hostname: odin
crmd[6988]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
crmd[6988]: 2006/09/08_14:03:44 info: G_main_add_TriggerHandler: Added signal manual handler
crmd[6988]: 2006/09/08_14:03:44 info: G_main_add_SignalHandler: Added signal handler for signal 17
crmd[6988]: 2006/09/08_14:03:44 WARN: cib_native_signon:cib_native.c Connection to CIB failed: connection failed
cib[6984]: 2006/09/08_14:03:44 info: activateCibXml:io.c CIB size is 58268 bytes (was 0)
cib[6984]: 2006/09/08_14:03:44 info: startCib:main.c CIB Initialization completed successfully
cib[6984]: 2006/09/08_14:03:44 WARN: init_start:main.c CCM Activation failed
cib[6984]: 2006/09/08_14:03:44 WARN: init_start:main.c CCM Connection failed 1 times (30 max)
attrd[6987]: 2006/09/08_14:03:44 info: register_with_ha:attrd.c UUID: d19c5306-d1dd-4c44-8808-5afc25056aaf
cib[6984]: 2006/09/08_14:03:45 WARN: init_start:main.c CCM Activation failed
cib[6984]: 2006/09/08_14:03:45 WARN: init_start:main.c CCM Connection failed 2 times (30 max)
cib[6984]: 2006/09/08_14:03:46 WARN: init_start:main.c CCM Activation failed
cib[6984]: 2006/09/08_14:03:46 WARN: init_start:main.c CCM Connection failed 3 times (30 max)
ccm[6983]: 2006/09/08_14:03:46 info: G_main_add_SignalHandler: Added signal handler for signal 15
cib[6984]: 2006/09/08_14:03:47 info: init_start:main.c Starting cib mainloop
cib[6984]: 2006/09/08_14:03:47 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
cib[6984]: 2006/09/08_14:03:47 info: mem_handle_event: instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=4
cib[6984]: 2006/09/08_14:03:47 info: cib_ccm_msg_callback:callbacks.c PEER: odin
cib[6990]: 2006/09/08_14:03:47 WARN: validate_cib_digest:io.c No on-disk digest present
cib[6990]: 2006/09/08_14:03:47 info: write_cib_contents:io.c Wrote version 0.0.0 of the CIB to disk (digest: 617f2a69668320a3d74b6e39b030b882)
crmd[6988]: 2006/09/08_14:03:47 info: do_cib_control:cib.c CIB connection established
cib[6984]: 2006/09/08_14:03:47 info: cib_null_callback:callbacks.c Setting cib_refresh_notify callbacks for crmd: on
crmd[6988]: 2006/09/08_14:03:47 info: register_with_ha:control.c Hostname: odin
cib[6984]: 2006/09/08_14:03:47 info: cib_client_status_callback:callbacks.c Status update: Client odin/cib now has status [join]
cib[6984]: 2006/09/08_14:03:47 info: cib_client_status_callback:callbacks.c Status update: Client odin/cib now has status [online]
crmd[6988]: 2006/09/08_14:03:48 info: register_with_ha:control.c UUID: d19c5306-d1dd-4c44-8808-5afc25056aaf
crmd[6988]: 2006/09/08_14:03:48 info: populate_cib_nodes:control.c Requesting the list of configured nodes
crmd[6988]: 2006/09/08_14:03:49 WARN: get_uuid:utils.c Could not calculate UUID for mrsmall
crmd[6988]: 2006/09/08_14:03:49 WARN: populate_cib_nodes:control.c Node mrsmall: no uuid found
crmd[6988]: 2006/09/08_14:03:50 WARN: get_uuid:utils.c Could not calculate UUID for mrbig
crmd[6988]: 2006/09/08_14:03:50 WARN: populate_cib_nodes:control.c Node mrbig: no uuid found
crmd[6988]: 2006/09/08_14:03:51 notice: populate_cib_nodes:control.c Node: odin (uuid: d19c5306-d1dd-4c44-8808-5afc25056aaf)
crmd[6988]: 2006/09/08_14:03:51 info: do_ha_control:control.c Connected to Heartbeat
crmd[6988]: 2006/09/08_14:03:51 info: do_ccm_control:ccm.c CCM connection established... waiting for first callback
crmd[6988]: 2006/09/08_14:03:51 info: do_started:control.c Delaying start, CCM (0000000000100000) not connected
crmd[6988]: 2006/09/08_14:03:51 info: init_start:main.c Starting crmd's mainloop
crmd[6988]: 2006/09/08_14:03:51 notice: crmd_client_status_callback:callbacks.c Status update: Client odin/crmd now has status [online]
crmd[6988]: 2006/09/08_14:03:51 info: crmd_client_status_callback:callbacks.c Uncaching UUID for odin
cib[6984]: 2006/09/08_14:03:51 info: activateCibXml:io.c CIB size is 60584 bytes (was 58508)
cib[6984]: 2006/09/08_14:03:51 info: cib_diff_notify:notify.c Local-only Change (client:6988, call: 3): 0.0.0 (ok)
cib[6991]: 2006/09/08_14:03:51 info: write_cib_contents:io.c Wrote version 0.0.0 of the CIB to disk (digest: 343b30b40b0169a8aec20e53e21ca211)
crmd[6988]: 2006/09/08_14:03:51 notice: crmd_client_status_callback:callbacks.c Status update: Client odin/crmd now has status [online]
crmd[6988]: 2006/09/08_14:03:51 info: crmd_client_status_callback:callbacks.c Uncaching UUID for odin
cib[6984]: 2006/09/08_14:03:51 info: activateCibXml:io.c CIB size is 62900 bytes (was 60584)
cib[6984]: 2006/09/08_14:03:51 info: cib_diff_notify:notify.c Local-only Change (client:6988, call: 5): 0.0.0 (ok)
cib[6992]: 2006/09/08_14:03:51 info: write_cib_contents:io.c Wrote version 0.0.0 of the CIB to disk (digest: 343b30b40b0169a8aec20e53e21ca211)
crmd[6988]: 2006/09/08_14:03:51 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
crmd[6988]: 2006/09/08_14:03:51 info: mem_handle_event: instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=4
crmd[6988]: 2006/09/08_14:03:51 info: crmd_ccm_msg_callback:callbacks.c Quorum lost after event=INVALID (id=1)
crmd[6988]: 2006/09/08_14:03:51 info: ccm_event_detail:ccm.c INVALID: trans=1, nodes=1, new=1, lost=0 n_idx=0, new_idx=0, old_idx=4
crmd[6988]: 2006/09/08_14:03:51 info: ccm_event_detail:ccm.c CURRENT: odin [nodeid=2, born=1]
crmd[6988]: 2006/09/08_14:03:51 info: ccm_event_detail:ccm.c NEW: odin [nodeid=2, born=1]
crmd[6988]: 2006/09/08_14:03:51 info: do_started:control.c Delaying start, Config not read (0000000000000040)
crmd[6988]: 2006/09/08_14:03:51 info: do_started:control.c The local CRM is operational
crmd[6988]: 2006/09/08_14:03:51 info: do_state_transition:fsa.c odin: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
crmd[6988]: 2006/09/08_14:03:51 info: update_dc:utils.c Set DC to <null> (<null>)
cib[6984]: 2006/09/08_14:03:51 info: activateCibXml:io.c CIB size is 63020 bytes (was 62900)
cib[6984]: 2006/09/08_14:03:51 info: cib_diff_notify:notify.c Local-only Change (client:6988, call: 7): 0.0.0 (ok)
cib[6993]: 2006/09/08_14:03:51 info: write_cib_contents:io.c Wrote version 0.0.0 of the CIB to disk (digest: 343b30b40b0169a8aec20e53e21ca211)
attrd[6987]: 2006/09/08_14:03:52 info: main:attrd.c Starting mainloop...
crmd[6988]: 2006/09/08_14:04:22 info: crm_timer_popped:utils.c Election Trigger (I_DC_TIMEOUT) just popped!
crmd[6988]: 2006/09/08_14:04:22 WARN: do_log:misc.c [[FSA]] Input I_DC_TIMEOUT from crm_timer_popped() received in state (S_PENDING)
crmd[6988]: 2006/09/08_14:04:22 info: do_state_transition:fsa.c odin: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]
crmd[6988]: 2006/09/08_14:04:22 info: update_dc:utils.c Set DC to <null> (<null>)
crmd[6988]: 2006/09/08_14:04:22 info: do_election_count_vote:election.c Updated voted hash for odin to vote
crmd[6988]: 2006/09/08_14:04:22 info: do_election_count_vote:election.c Election ignore: our vote (odin)
crmd[6988]: 2006/09/08_14:04:22 info: do_state_transition:fsa.c odin: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
crmd[6988]: 2006/09/08_14:04:22 info: start_subsystem:subsystems.c Starting sub-system "tengine"
crmd[6988]: 2006/09/08_14:04:22 info: start_subsystem:subsystems.c Starting sub-system "pengine"
crmd[6988]: 2006/09/08_14:04:22 info: do_dc_takeover:election.c Taking over DC status for this partition
crmd[6988]: 2006/09/08_14:04:22 info: update_dc:utils.c Set DC to <null> (<null>)
crmd[6988]: 2006/09/08_14:04:22 info: do_dc_join_offer_all:join_dc.c join-1: Waiting on 1 outstanding join acks
cib[6984]: 2006/09/08_14:04:22 info: cib_process_readwrite:messages.c We are now in R/W mode
tengine[6994]: 2006/09/08_14:04:22 info: G_main_add_SignalHandler: Added signal handler for signal 15
tengine[6994]: 2006/09/08_14:04:22 info: G_main_add_TriggerHandler: Added signal manual handler
pengine[6995]: 2006/09/08_14:04:22 info: G_main_add_SignalHandler: Added signal handler for signal 15
pengine[6995]: 2006/09/08_14:04:22 info: init_start:main.c Starting pengine
cib[6984]: 2006/09/08_14:04:22 info: cib_diff_notify:notify.c Update (client: 6988, call:11): 0.0.0 -> 0.0.1 (ok)
cib[6996]: 2006/09/08_14:04:22 info: write_cib_contents:io.c Wrote version 0.0.1 of the CIB to disk (digest: cb2cb328404a2d378d805f28c5892a60)
cib[6984]: 2006/09/08_14:04:22 info: cib_null_callback:callbacks.c Setting cib_diff_notify callbacks for tengine: on
tengine[6994]: 2006/09/08_14:04:22 info: init_start:main.c Registering TE UUID: b7c3c315-0b4f-41e3-aaf5-622887eb9dd2
tengine[6994]: 2006/09/08_14:04:22 info: set_graph_functions:utils.c Setting custom graph functions
tengine[6994]: 2006/09/08_14:04:22 info: unpack_graph:unpack.c Unpacked transition -1: 0 actions in 0 synapses
tengine[6994]: 2006/09/08_14:04:22 info: init_start:main.c Starting tengine
crmd[6988]: 2006/09/08_14:04:23 info: update_dc:utils.c Set DC to odin (1.0.6)
crmd[6988]: 2006/09/08_14:04:23 info: do_state_transition:fsa.c odin: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
crmd[6988]: 2006/09/08_14:04:23 info: do_state_transition:fsa.c All 1 cluster nodes responded to the join offer.
crmd[6988]: 2006/09/08_14:04:23 info: update_attrd:join_dc.c Connecting to attrd...
cib[6984]: 2006/09/08_14:04:23 info: sync_our_cib:messages.c Syncing CIB to all peers
attrd[6987]: 2006/09/08_14:04:23 info: attrd_local_callback:attrd.c Sending full refresh
cib[6984]: 2006/09/08_14:04:23 info: activateCibXml:io.c CIB size is 63172 bytes (was 63020)
cib[6984]: 2006/09/08_14:04:23 info: cib_diff_notify:notify.c Update (client: 6988, call:14): 0.0.1 -> 0.0.2 (ok)
tengine[6994]: 2006/09/08_14:04:23 info: te_update_diff:callbacks.c Processing diff (cib_update): 0.0.1 -> 0.0.2
cib[6984]: 2006/09/08_14:04:23 info: cib_diff_notify:notify.c Update (client: 6988, call:15): 0.0.2 -> 0.1.3 (ok)
tengine[6994]: 2006/09/08_14:04:23 info: te_update_diff:callbacks.c Processing diff (cib_bump): 0.0.2 -> 0.1.3
cib[6984]: 2006/09/08_14:04:23 info: cib_diff_notify:notify.c Update (client: 6988, call:16): 0.1.3 -> 0.1.4 (ok)
tengine[6994]: 2006/09/08_14:04:23 info: te_update_diff:callbacks.c Processing diff (cib_update): 0.1.3 -> 0.1.4
cib[6997]: 2006/09/08_14:04:23 info: write_cib_contents:io.c Wrote version 0.1.4 of the CIB to disk (digest: 5c38cf532d8065efc4d9d535f1d1fa7e)
crmd[6988]: 2006/09/08_14:04:23 info: update_dc:utils.c Set DC to odin (1.0.6)
crmd[6988]: 2006/09/08_14:04:23 info: do_dc_join_ack:join_dc.c join-1: Updating node state to member for odin)
cib[6984]: 2006/09/08_14:04:23 info: activateCibXml:io.c CIB size is 67052 bytes (was 63172)
cib[6984]: 2006/09/08_14:04:23 info: cib_diff_notify:notify.c Update (client: 6988, call:17): 0.1.4 -> 0.1.5 (ok)
crmd[6988]: 2006/09/08_14:04:23 info: do_state_transition:fsa.c odin: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
crmd[6988]: 2006/09/08_14:04:23 info: do_state_transition:fsa.c All 1 cluster nodes are eligable to run resources.
tengine[6994]: 2006/09/08_14:04:23 info: te_update_diff:callbacks.c Processing diff (cib_update): 0.1.4 -> 0.1.5
tengine[6994]: 2006/09/08_14:04:23 info: update_abort_priority:utils.c Abort priority upgraded to 1000000
cib[6998]: 2006/09/08_14:04:23 info: write_cib_contents:io.c Wrote version 0.1.5 of the CIB to disk (digest: 414923ff81f0a3a4a7939fc032a65fbf)
pengine[6995]: 2006/09/08_14:04:23 info: process_pe_message: [generation] <cib num_updates="5" epoch="1" admin_epoch="0" cib_feature_revision="1.3" have_quorum="false" generated="true" ccm_transition="1" num_peers="1" dc_uuid="d19c5306-d1dd-4c44-8808-5afc25056aaf"/>
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c Default stickiness: 0
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c Default failure stickiness: 0
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c STONITH of failed nodes is disabled
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c STONITH will reboot nodes
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c On loss of CCM Quorum: Stop ALL resources
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c Orphan resources are stopped
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c Orphan resource actions are stopped
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c Stopped resources are removed from the status section: false
pengine[6995]: 2006/09/08_14:04:23 info: unpack_config:unpack.c By default resources are managed
pengine[6995]: 2006/09/08_14:04:23 WARN: cluster_status:status.c We do not have quorum - fencing and resource management disabled
pengine[6995]: 2006/09/08_14:04:23 info: determine_online_status:unpack.c Node odin is online
pengine[6995]: 2006/09/08_14:04:23 info: res_VE_web1 (designassembly::ocf:ManageVE): Stopped
pengine[6995]: 2006/09/08_14:04:23 notice: native_create_probe:native.c odin: Created probe for res_VE_web1
pengine[6995]: 2006/09/08_14:04:23 notice: stage8:allocate.c Created transition graph 0.
pengine[6995]: 2006/09/08_14:04:23 WARN: process_pe_message:pengine.c No value specified for cluster preference: pe-input-series-max
crmd[6988]: 2006/09/08_14:04:23 info: do_state_transition:fsa.c odin: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
crmd[6988]: 2006/09/08_14:04:23 WARN: Exiting pengine process 6995 killed by signal 11.
crmd[6988]: 2006/09/08_14:04:23 ERROR: Exiting pengine process 6995 dumped core
crmd[6988]: 2006/09/08_14:04:23 info: crmdManagedChildDied:subsystems.c Process pengine:[6995] exited (signal=11, exitcode=0)
crmd[6988]: 2006/09/08_14:04:23 ERROR: crmdManagedChildDied:subsystems.c The pengine subsystem terminated unexpectedly
crmd[6988]: 2006/09/08_14:04:23 ERROR: do_log:misc.c [[FSA]] Input I_ERROR from crmdManagedChildDied() received in state (S_TRANSITION_ENGINE)
crmd[6988]: 2006/09/08_14:04:23 info: do_state_transition:fsa.c odin: State transition S_TRANSITION_ENGINE -> S_RECOVERY [ input=I_ERROR cause=C_IPC_MESSAGE origin=crmdManagedChildDied ]
tengine[6994]: 2006/09/08_14:04:23 info: unpack_graph:unpack.c Unpacked transition 0: 3 actions in 3 synapses
crmd[6988]: 2006/09/08_14:04:23 ERROR: do_recover:control.c Action A_RECOVER (0000000001000000) not supported
crmd[6988]: 2006/09/08_14:04:23 WARN: do_election_vote:election.c Not voting in election, we're in state S_RECOVERY
tengine[6994]: 2006/09/08_14:04:23 info: send_rsc_command:actions.c Initiating action 3: res_VE_web1_monitor_0 on odin
crmd[6988]: 2006/09/08_14:04:23 info: do_dc_release:election.c DC role released
crmd[6988]: 2006/09/08_14:04:23 info: stop_subsystem:subsystems.c Sent -TERM to tengine: [6994]
tengine[6994]: 2006/09/08_14:04:23 info: update_abort_priority:utils.c Abort priority upgraded to 1000000
crmd[6988]: 2006/09/08_14:04:23 ERROR: do_log:misc.c [[FSA]] Input I_STOP from do_recover() received in state (S_RECOVERY)
tengine[6994]: 2006/09/08_14:04:23 info: update_abort_priority:utils.c Abort action 0 superceeded by 3
crmd[6988]: 2006/09/08_14:04:23 info: do_state_transition:fsa.c odin: State transition S_RECOVERY -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ]
crmd[6988]: 2006/09/08_14:04:23 info: do_dc_release:election.c DC role released
crmd[6988]: 2006/09/08_14:04:23 info: stop_subsystem:subsystems.c Sent -TERM to tengine: [6994]
crmd[6988]: 2006/09/08_14:04:23 info: do_shutdown:control.c Terminating the tengine
crmd[6988]: 2006/09/08_14:04:23 info: stop_subsystem:subsystems.c Sent -TERM to tengine: [6994]
crmd[6988]: 2006/09/08_14:04:23 info: do_shutdown:control.c Waiting for subsystems to exit
crmd[6988]: 2006/09/08_14:04:23 WARN: register_fsa_input_adv:messages.c do_shutdown stalled the FSA with pending inputs
crmd[6988]: 2006/09/08_14:04:23 WARN: do_log:misc.c [[FSA]] Input I_PENDING from do_election_vote() received in state (S_STOPPING)
crmd[6988]: 2006/09/08_14:04:23 info: do_shutdown:control.c Terminating the tengine
crmd[6988]: 2006/09/08_14:04:23 info: stop_subsystem:subsystems.c Sent -TERM to tengine: [6994]
crmd[6988]: 2006/09/08_14:04:23 info: do_shutdown:control.c Waiting for subsystems to exit
crmd[6988]: 2006/09/08_14:04:23 WARN: register_fsa_input_adv:messages.c do_shutdown stalled the FSA with pending inputs
crmd[6988]: 2006/09/08_14:04:23 info: do_lrm_rsc_op:lrm.c Performing op monitor on res_VE_web1 (interval=0ms, key=0:b7c3c315-0b4f-41e3-aaf5-622887eb9dd2)
crmd[6988]: 2006/09/08_14:04:23 info: do_lrm_rsc_op:lrm.c Discarding attempt to perform action monitor on res_VE_web1 in state S_STOPPING
crmd[6988]: 2006/09/08_14:04:23 info: send_direct_ack:lrm.c ACK'ing resource op: monitor for res_VE_web1
crmd[6988]: 2006/09/08_14:04:23 info: process_client_disconnect:utils.c Received HUP from pengine:[-1]
crmd[6988]: 2006/09/08_14:04:23 WARN: do_log:misc.c [[FSA]] Input I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING)
crmd[6988]: 2006/09/08_14:04:23 info: do_shutdown:control.c Terminating the tengine
crmd[6988]: 2006/09/08_14:04:23 info: stop_subsystem:subsystems.c Sent -TERM to tengine: [6994]
crmd[6988]: 2006/09/08_14:04:23 info: do_shutdown:control.c Waiting for subsystems to exit
crmd[6988]: 2006/09/08_14:04:23 WARN: register_fsa_input_adv:messages.c do_shutdown stalled the FSA with pending inputs
crmd[6988]: 2006/09/08_14:04:24 WARN: do_log:misc.c [[FSA]] Input I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING)
crmd[6988]: 2006/09/08_14:04:24 info: do_shutdown:control.c Terminating the tengine
crmd[6988]: 2006/09/08_14:04:24 info: stop_subsystem:subsystems.c Sent -TERM to tengine: [6994]
crmd[6988]: 2006/09/08_14:04:24 info: do_shutdown:control.c Waiting for subsystems to exit
heartbeat[6974]: 2006/09/08_14:05:32 info: killing /usr/lib/heartbeat/crmd process group 6988 with signal 15
crmd[6988]: 2006/09/08_14:05:32 ERROR: crm_shutdown:control.c Escalating the shutdown
crmd[6988]: 2006/09/08_14:05:32 ERROR: do_log:misc.c [[FSA]] Input I_ERROR from crm_shutdown() received in state (S_STOPPING)
crmd[6988]: 2006/09/08_14:05:32 info: do_state_transition:fsa.c odin: State transition S_STOPPING -> S_TERMINATE [ input=I_ERROR cause=C_SHUTDOWN origin=crm_shutdown ]
crmd[6988]: 2006/09/08_14:05:32 info: verify_stopped:lrm.c Checking for active resources before exit
crmd[6988]: 2006/09/08_14:05:32 ERROR: do_exit:control.c Performing A_EXIT_1 - forcefully exiting the CRMd
crmd[6988]: 2006/09/08_14:05:32 ERROR: do_exit:control.c Could not recover from internal error
crmd[6988]: 2006/09/08_14:05:32 info: do_exit:control.c [crmd] stopped (2)
ccm[6983]: 2006/09/08_14:05:32 info: client (pid=6988) removed from ccm
tengine[6994]: 2006/09/08_14:05:32 ERROR: subsystem_msg_dispatch:ipc.c The server 6988 has left us: Shutting down...NOW
heartbeat[6974]: 2006/09/08_14:05:32 info: killing /usr/lib/heartbeat/attrd process group 6987 with signal 15
attrd[6987]: 2006/09/08_14:05:32 info: attrd_shutdown:attrd.c Exiting
attrd[6987]: 2006/09/08_14:05:32 info: main:attrd.c Exiting...
heartbeat[6974]: 2006/09/08_14:05:32 info: killing /usr/lib/heartbeat/stonithd process group 6986 with signal 15
stonithd[6986]: 2006/09/08_14:05:32 notice: /usr/lib/heartbeat/stonithd normally quit.
heartbeat[6974]: 2006/09/08_14:05:32 info: killing /usr/lib/heartbeat/lrmd process group 6985 with signal 15
lrmd[6985]: 2006/09/08_14:05:32 info: lrmd is shutting down
heartbeat[6974]: 2006/09/08_14:05:32 info: killing /usr/lib/heartbeat/cib process group 6984 with signal 15
cib[6984]: 2006/09/08_14:05:32 info: cib_shutdown:main.c Disconnected 0 clients
cib[6984]: 2006/09/08_14:05:32 info: cib_process_disconnect:callbacks.c All clients disconnected...
cib[6984]: 2006/09/08_14:05:32 info: terminate_ha_connection:callbacks.c initiate_exit: Disconnecting heartbeat
cib[6984]: 2006/09/08_14:05:32 info: cib_ha_connection_destroy:main.c Heartbeat disconnection complete... exiting
cib[6984]: 2006/09/08_14:05:32 info: uninitializeCib:io.c The CIB has been deallocated.
ccm[6983]: 2006/09/08_14:05:32 info: client (pid=6984) removed from ccm
heartbeat[6974]: 2006/09/08_14:05:32 info: killing /usr/lib/heartbeat/ccm process group 6983 with signal 15
ccm[6983]: 2006/09/08_14:05:32 info: received SIGTERM, going to shut down
heartbeat[6974]: 2006/09/08_14:05:33 info: killing HBFIFO process 6979 with signal 15
heartbeat[6974]: 2006/09/08_14:05:33 info: killing HBWRITE process 6980 with signal 15
heartbeat[6974]: 2006/09/08_14:05:33 info: killing HBREAD process 6981 with signal 15
heartbeat[6974]: 2006/09/08_14:05:33 info: Core process 6979 exited. 3 remaining
heartbeat[6974]: 2006/09/08_14:05:33 info: Core process 6981 exited. 2 remaining
heartbeat[6974]: 2006/09/08_14:05:33 info: Core process 6980 exited. 1 remaining
heartbeat[6974]: 2006/09/08_14:05:33 info: odin Heartbeat shutdown complete.
-------------- next part --------------
#
# ATTENTION: As the configuration file is read line by line,
# THE ORDER OF DIRECTIVE MATTERS!
#
# In particular, make sure that the udpport, serial baud rate
# etc. are set before the heartbeat media are defined!
# debug and log file directives go into effect when they
# are encountered.
#
# All will be fine if you keep them ordered as in this example.
#
#
# first of all, we want a Heartbeat v2 CRM style cluster
# this renders the auto_failback option useless
#
# (md)
#
crm on
# File to write debug messages to
#
debugfile /var/log/ha-debug
# File to write other messages to
#
logfile /var/log/ha-log
# Facility to use for syslog()/logger
#
logfacility syslog
# keepalive: how long between heartbeats? (in seconds, ms for milliseconds)
#
keepalive 2
# deadtime: how long-to-declare-host-dead?
# (setting too low can cause split-brain problem)
#
deadtime 20
# warntime: how long before issuing "late heartbeat" warning?
#
warntime 10
# initdead: very first dead time
# (initial dead time for node discovery when heartbeat is started,
# should be at least twice the normal dead time)
#
initdead 60
# udpport: for bcast/ucast communication
#
udpport 694
# baud: baud rate for serial ports
#
#baud 19200
# serial: serial serialportname
#
#serial /dev/ttyS0 # Linux
# bcast: interfaces to broadcast heartbeats over
#
bcast eth1
# mcast: multicast heartbeat medium
#
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on
# [mcast group] multicast group to join (class D multicast address
# 224.0.0.0 - 239.255.255.255)
# [port] udp port to sendto/rcvfrom (set this value to the
# same value as "udpport" above)
# [ttl] the ttl value for outbound heartbeats. this effects
# how far the multicast packet will propagate. (0-255)
# Must be greater than zero.
# [loop] toggles loopback for outbound multicast heartbeats.
# if enabled, an outbound packet will be looped back and
# received by the interface it was sent on. (0 or 1)
# Set this value to zero.
#
#mcast eth0 225.0.0.1 694 1 0
# ucast: unicast / udp heartbeat medium
#
# ucast [dev] [peer-ip-addr]
#
# [dev] device to send/rcv heartbeats on
# [peer-ip-addr] IP address of peer to send packets to
#
#ucast eth0 192.168.1.2
# auto_failback: determines whether a resource will
# automatically fail back to its "primary" node, or remain
# on whatever node is serving it until that node fails, or
# an administrator intervenes.
#
# on - enable automatic failbacks
# off - disable automatic failbacks
# legacy - enable automatic failbacks in systems
# where all nodes do not yet support
# the auto_failback option.
#
# Activating Heartbeat v2 CRM clusters renders this useless...
#
#auto_failback on
# stonith: Basic STONITH support
#
# stonith <stonith_type> <configfile>
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
# stonith_host: you can configure multiple stonith devices using this directive.
#
# stonith_host <hostfrom> <stonith_type> <params...>
# <hostfrom> is the machine the stonith device is attached
# to or * to mean it is accessible from any host.
# <stonith_type> is the type of stonith device (a list of
# supported drives is in /usr/lib/stonith.)
# <params...> are driver specific parameters. To see the
# format for a particular device, run:
# stonith -l -t <stonith_type>
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
# watchdog: watchdog timer
#
# If our own heart doesn't beat for a minute, then our machine will
# reboot.
#
# NOTE: If you are using the software watchdog, you very likely
# wish to load the module with the parameter "nowayout=0" or
# compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even
# an orderly shutdown of heartbeat will trigger a reboot, which is
# very likely NOT what you want.
#
#watchdog /dev/watchdog
# node: node uname (nodes in our cluster)
#
node odin
node mrbig
node mrsmall
# ping: ping ipaddr
#
# Treats ipaddr as a psuedo-cluster-member
# Used together with ipfail below...
# note: don't use a cluster node as ping node
#
#ping 10.10.10.254
# Treats 10.10.10.254 and 10.10.10.253 as a psuedo-cluster-member
# called group1. If either 10.10.10.254 or 10.10.10.253 are up
# then group1 is up
# Used together with ipfail below...
#
#ping_group group1 10.10.10.254 10.10.10.253
# hbaping: HBA ping derective for Fiber Channel
#
#hbaping fc-card-name
# Processes started and stopped with heartbeat. Restarted unless
# they exit with rc=100
#
#respawn userid /path/name/to/run
#respawn hacluster /usr/lib/heartbeat/ipfail
# Access control for client api. default is no access
#
#apiauth client-name gid=gidlist uid=uidlist
#apiauth ipfail gid=haclient uid=hacluster
###########################
#
# Unusual options.
#
###########################
#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
# deadping - dead time for ping nodes
#deadping 30
#
# hbgenmethod - Heartbeat generation number creation method
# Normally these are stored on disk and incremented as needed.
#hbgenmethod time
#
# realtime - enable/disable realtime execution (high priority, etc.)
# defaults to on
#realtime off
#
# debug - set debug level
# defaults to zero
#debug 1
#
# API Authentication - replaces the fifo-permissions-based system of the past
#
#
# You can put a uid list and/or a gid list.
# If you put both, then a process is authorized if it qualifies under either
# the uid list, or under the gid list.
#
# The groupname "default" has special meaning. If it is specified, then
# this will be used for authorizing groupless clients, and any client groups
# not otherwise specified.
#
# There is a subtle exception to this. "default" will never be used in the
# following cases (actual default auth directives noted in brackets)
# ipfail (uid=HA_CCMUSER)
# ccm (uid=HA_CCMUSER)
# ping (gid=HA_APIGROUP)
# cl_status (gid=HA_APIGROUP)
#
# This is done to avoid creating a gaping security hole and matches the most
# likely desired configuration.
#
#apiauth ipfail uid=hacluster
#apiauth ccm uid=hacluster
#apiauth cms uid=hacluster
#apiauth ping gid=haclient uid=alanr,root
#apiauth default gid=haclient
# message format in the wire, it can be classic or netstring,
# default: classic
#msgfmt classic/netstring
# Do we use logging daemon?
# If logging daemon is used, logfile/debugfile/logfacility in this file
# are not meaningful any longer. You should check the config file for logging
# daemon (the default is /etc/logd.cf)
# more infomartion can be fould in http://www.linux-ha.org/ha_2ecf_2fUseLogdDirective
# Setting use_logd to "yes" is recommended
#
# use_logd yes/no
#
# the interval we reconnect to logging daemon if the previous connection failed
# default: 60 seconds
#conn_logd_time 60
#
#
# Configure compression module
# It could be zlib or bz2, depending on whether u have the corresponding
# library in the system.
#compression bz2
#
# Confiugre compression threshold
# This value determines the threshold to compress a message,
# e.g. if the threshold is 1, then any message with size greater than 1 KB
# will be compressed, the default is 2 (KB)
#compression_threshold 2
More information about the Linux-HA
mailing list