[Linux-HA] pingd, quorum, split-brain... should I give up?

Riccardo Perni riccardo.perni at aslromab.it
Tue Oct 23 09:59:10 MDT 2007



Dejan Muhamedagic <dejanmm at fastmail.fm> ha scritto:

> Hi,
>
> On Mon, Oct 22, 2007 at 11:30:03PM +0200, Riccardo Perni wrote:
>> Hi Dejan,
>> thank you for your reply
>>
>>
>> Dejan Muhamedagic <dejanmm at fastmail.fm> ha scritto:
>>
>> >Hi,
>> >
>> >On Mon, Oct 22, 2007 at 02:45:42PM +0200, Riccardo Perni wrote:
>> >>Hello to all,
>> >>it is several days that I'm trying	to set-up a split-site cluster but
>> >>with scarce results.
>> >>
>> >>Since the two cluster nodes will be several Km away I cannot set up a
>> >>reliable communication media between them, so I have to run heartbeat on
>> >>the
>> >>main Ethernet;  I hoped that using pingd and an external ping site could
>> >>help me to solve the potential conflict that will showup, but probably I'm
>> >>not smart enough to solve this problem... Can someone help me?
>> >>
>> >>Actually I've set up a test using virtual machines
>> >>Only one resource is running (a virtual IP using ocf:IPaddr) and only one
>> >>constraint copied from linux-ha.org pingd FAQ.
>> >>All seems to work right and if I broke the network connectivity of one of
>> >>the nodes the resource is runned by the node with woking network; but both
>> >>nodes get the "dc" status and when the connectivity is restored I have a
>> >>split-brain condition with both nodes running the resource.
>> >
>> >Right, because you can't prevent split-brain this way. What you
>> >may prevent though is running a resource on the node which lost
>> >connectivity. What does exactly happen once the connectivity's
>> >restored? Can you post logs for that?
>>
>> Yes, I understand this, just I'd like to have the cluster resync
>> itself upon reconnecting, BTW I have attache the log of both nodes..
>
> Unfortunately, they end before the interesting part.

Ouch!! well I resend the reconnection part only with a few more  
lines... no other entry added to the log after 10 minutes, hope it is  
enough

>
> Thanks,

thanks to you!!

>
> Dejan
>
>> >
>> >>Is it possible
>> >>to handle this situation?
>> >
>> >You may try quorumd. See
>> >
>> >http://www.linux-ha.org/QuorumServerGuide
>>
>> I'm going to look at it, but is'n it another SPOF?
>>
>> >
>> >Thanks,
>> >
>> >Dejan
>> >
>> >>Thankyou
>> >>Riccardo
>> >>
>> >>
>> >>
>> >>
>> >>_______________________________________________
>> >>Linux-HA mailing list
>> >>Linux-HA at lists.linux-ha.org
>> >>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >>See also: http://linux-ha.org/ReportingProblems
>> >_______________________________________________
>> >Linux-HA mailing list
>> >Linux-HA at lists.linux-ha.org
>> >http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >See also: http://linux-ha.org/ReportingProblems
>> >
>> >
>>
>> --
>> Riccardo Perni
>> Unità Operativa Informatica Aziendale
>> ASL Roma-B
>>
>>
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP, the Internet Messaging Program.
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

-------------- next part --------------
************************ net on here

Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: WARN: Late heartbeat: Node 10.44.4.4: interval 111070 ms
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: info: Status update for node 10.44.4.4: status ping
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: info: Link 10.44.4.4:10.44.4.4 up.
Oct 23 18:00:20 clusterpaghe01 pingd: [3369]: notice: pingd_nstatus_callback: Status update: Ping node 10.44.4.4 now has status [ping]
Oct 23 18:00:20 clusterpaghe01 crmd: [3366]: notice: crmd_ha_status_callback: Status update: Node 10.44.4.4 now has status [ping]
Oct 23 18:00:20 clusterpaghe01 pingd: [3369]: info: send_update: 1 active ping nodes
Oct 23 18:00:20 clusterpaghe01 crmd: [3366]: info: crmd_ha_status_callback: Ping node 10.44.4.4 is ping
Oct 23 18:00:20 clusterpaghe01 pingd: [3369]: notice: pingd_lstatus_callback: Status update: Ping node 10.44.4.4 now has status [up]
Oct 23 18:00:20 clusterpaghe01 pingd: [3369]: notice: pingd_nstatus_callback: Status update: Ping node 10.44.4.4 now has status [up]
Oct 23 18:00:20 clusterpaghe01 pingd: [3369]: info: send_update: 1 active ping nodes
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: CRIT: Cluster node clusterpaghe02 returning after partition.
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: WARN: Deadtime value may be too small.
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: info: See FAQ for information on tuning deadtime.
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: info: URL: http://linux-ha.org/FAQ#heavy_load
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: info: Link clusterpaghe02:eth0 up.
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: WARN: Late heartbeat: Node clusterpaghe02: interval 111070 ms
Oct 23 18:00:20 clusterpaghe01 heartbeat: [3089]: info: Status update for node clusterpaghe02: status active
Oct 23 18:00:20 clusterpaghe01 pingd: [3369]: notice: pingd_lstatus_callback: Status update: Ping node clusterpaghe02 now has status [up]
Oct 23 18:00:20 clusterpaghe01 crmd: [3366]: notice: crmd_ha_status_callback: Status update: Node clusterpaghe02 now has status [active]
Oct 23 18:00:20 clusterpaghe01 pingd: [3369]: notice: pingd_nstatus_callback: Status update: Ping node clusterpaghe02 now has status [up]
Oct 23 18:00:20 clusterpaghe01 pingd: [3369]: notice: pingd_nstatus_callback: Status update: Ping node clusterpaghe02 now has status [active]
Oct 23 18:00:20 clusterpaghe01 cib: [3362]: info: cib_diff_notify: Local-only Change (client:3366, call: 45): 0.81.1604 (ok)
Oct 23 18:00:20 clusterpaghe01 haclient: on_event:evt:cib_changed
Oct 23 18:00:20 clusterpaghe01 tengine: [3539]: info: te_update_diff: Processing diff (cib_update): 0.81.1604 -> 0.81.1604
Oct 23 18:00:20 clusterpaghe01 cib: [3799]: info: write_cib_contents: Wrote version 0.81.1604 of the CIB to disk (digest: b2ff6b7bdf55cf3645f721d39bfa1578)
Oct 23 18:00:21 clusterpaghe01 heartbeat: [3089]: info: Link 10.44.4.1:10.44.4.1 up.
Oct 23 18:00:21 clusterpaghe01 heartbeat: [3089]: WARN: Late heartbeat: Node 10.44.4.1: interval 112070 ms
Oct 23 18:00:21 clusterpaghe01 heartbeat: [3089]: info: Status update for node 10.44.4.1: status ping
Oct 23 18:00:21 clusterpaghe01 pingd: [3369]: notice: pingd_lstatus_callback: Status update: Ping node 10.44.4.1 now has status [up]
Oct 23 18:00:21 clusterpaghe01 crmd: [3366]: notice: crmd_ha_status_callback: Status update: Node 10.44.4.1 now has status [ping]
Oct 23 18:00:21 clusterpaghe01 pingd: [3369]: notice: pingd_nstatus_callback: Status update: Ping node 10.44.4.1 now has status [up]
Oct 23 18:00:21 clusterpaghe01 crmd: [3366]: info: crmd_ha_status_callback: Ping node 10.44.4.1 is ping
Oct 23 18:00:21 clusterpaghe01 pingd: [3369]: info: send_update: 2 active ping nodes
Oct 23 18:00:21 clusterpaghe01 pingd: [3369]: notice: pingd_nstatus_callback: Status update: Ping node 10.44.4.1 now has status [ping]
Oct 23 18:00:21 clusterpaghe01 pingd: [3369]: info: send_update: 2 active ping nodes
Oct 23 18:00:22 clusterpaghe01 attrd: [3365]: info: attrd_timer_callback: Sending flush op to all hosts for: pingd
Oct 23 18:00:22 clusterpaghe01 attrd: [3365]: info: attrd_ha_callback: flush message from clusterpaghe01
Oct 23 18:00:22 clusterpaghe01 cib: [3362]: info: cib_diff_notify: Update (client: 3365, call:13): 0.81.1604 -> 0.81.1605 (ok)
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: te_update_diff: Processing diff (cib_modify): 0.81.1604 -> 0.81.1605
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: extract_event: Aborting on transient_attributes changes for 8b658843-7f87-4a86-a398-e996f92fa12b
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: update_abort_priority: Abort priority upgraded to 1000000
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: te_update_diff: Aborting on transient_attributes deletions
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: do_state_transition: clusterpaghe01: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Oct 23 18:00:22 clusterpaghe01 haclient: on_event:evt:cib_changed
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: do_state_transition: clusterpaghe01: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
Oct 23 18:00:22 clusterpaghe01 attrd: [3365]: info: attrd_ha_callback: Sent update 13: pingd=200000
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: info: log_data_element: process_pe_message: [generation] <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" generated="true" epoch="81" num_updates="1605" cib-last-written="Mon Oct 22 23:50:24 2007" ccm_transition="3" dc_uuid="8b658843-7f87-4a86-a398-e996f92fa12b"/>
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value 'stop' for cluster option 'no-quorum-policy'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value 'true' for cluster option 'symmetric-cluster'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value '0' for cluster option 'default-resource-stickiness'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value 'true' for cluster option 'is-managed-default'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: unpack_graph: Unpacked transition 8: 1 actions in 1 synapses
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: send_rsc_command: Initiating action 3: resource_Virtual_IP_start_0 on clusterpaghe01
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: info: determine_online_status: Node clusterpaghe01 is online
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: info: native_print: resource_Virtual_IP (heartbeat::ocf:IPaddr):        Stopped
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: notice: StartRsc:  clusterpaghe01       Start resource_Virtual_IP
Oct 23 18:00:22 clusterpaghe01 pengine: [3540]: info: process_pe_message: Transition 8: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-124.bz2
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: do_lrm_rsc_op: Performing op=resource_Virtual_IP_start_0 key=3:8:ae710204-955b-4d43-8a89-4a56c94db550)
Oct 23 18:00:22 clusterpaghe01 lrmd: [3363]: info: RA output: (resource_Virtual_IP:start:stderr) Rewrote octal netmask as: 24
Oct 23 18:00:22 clusterpaghe01 IPaddr[3801]: [3809]: INFO: Using calculated nic for 10.44.4.28: eth0
Oct 23 18:00:22 clusterpaghe01 cib: [3800]: info: write_cib_contents: Wrote version 0.81.1605 of the CIB to disk (digest: 6a8d7f01a68b9b6eb2731897ccdbd0af)
Oct 23 18:00:22 clusterpaghe01 IPaddr[3801]: [3814]: INFO: Using calculated netmask for 10.44.4.28: 255.255.255.0
Oct 23 18:00:22 clusterpaghe01 IPaddr[3801]: [3835]: INFO: eval /sbin/ifconfig eth0:0 10.44.4.28 netmask 255.255.252.0 broadcast 10.44.7.255
Oct 23 18:00:22 clusterpaghe01 IPaddr[3801]: [3840]: DEBUG: Sending Gratuitous Arp for 10.44.4.28 on eth0:0 [eth0]
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Oct 23 18:00:22 clusterpaghe01 cib: [3362]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: mem_handle_event: no mbr_track info
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: mem_handle_event: instance=2, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=2)
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: ccm_event_detail: NEW MEMBERSHIP: trans=2, nodes=2, new=1, lost=0 n_idx=0, new_idx=2, old_idx=4
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: ccm_event_detail:    CURRENT: clusterpaghe02 [nodeid=1, born=1]
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: ccm_event_detail:    CURRENT: clusterpaghe01 [nodeid=0, born=2]
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: ccm_event_detail:    NEW:     clusterpaghe02 [nodeid=1, born=1]
Oct 23 18:00:22 clusterpaghe01 cib: [3362]: info: mem_handle_event: no mbr_track info
Oct 23 18:00:22 clusterpaghe01 cib: [3362]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Oct 23 18:00:22 clusterpaghe01 cib: [3362]: info: mem_handle_event: instance=2, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Oct 23 18:00:22 clusterpaghe01 cib: [3362]: info: cib_ccm_msg_callback: PEER: clusterpaghe02
Oct 23 18:00:22 clusterpaghe01 cib: [3362]: info: cib_ccm_msg_callback: PEER: clusterpaghe01
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: process_lrm_event: LRM operation resource_Virtual_IP_start_0 (call=5, rc=0) complete
Oct 23 18:00:22 clusterpaghe01 cib: [3362]: info: cib_diff_notify: Update (client: 3366, call:48): 0.81.1605 -> 0.81.1606 (ok)
Oct 23 18:00:22 clusterpaghe01 haclient: on_event:evt:cib_changed
Oct 23 18:00:22 clusterpaghe01 crmd: [3366]: info: do_state_transition: clusterpaghe01: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: te_update_diff: Processing diff (cib_update): 0.81.1605 -> 0.81.1606
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: match_graph_event: Action resource_Virtual_IP_start_0 (3) confirmed on 8b658843-7f87-4a86-a398-e996f92fa12b
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: run_graph: Transition 8: (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Oct 23 18:00:22 clusterpaghe01 tengine: [3539]: info: notify_crmd: Transition 8 status: te_complete - <null>
Oct 23 18:00:22 clusterpaghe01 cib: [3854]: info: write_cib_contents: Wrote version 0.81.1606 of the CIB to disk (digest: fbba7e0699d923c77ba8fae65cdcae0d)
Oct 23 18:01:01 clusterpaghe01 cib: [3362]: info: cib_stats: Processed 68 operations (1911.00us average, 0% utilization) in the last 10min


-------------- next part --------------
************************** net on other

Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: CRIT: Cluster node clusterpaghe01 returning after partition.
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
Oct 23 17:23:56 clusterpaghe02 pingd: [3405]: notice: pingd_lstatus_callback: Status update: Ping node clusterpaghe01 now has status [up]
Oct 23 17:23:56 clusterpaghe02 crmd: [3402]: notice: crmd_ha_status_callback: Status update: Node clusterpaghe01 now has status [active]
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: WARN: Deadtime value may be too small.
Oct 23 17:23:56 clusterpaghe02 pingd: [3405]: notice: pingd_nstatus_callback: Status update: Ping node clusterpaghe01 now has status [up]
Oct 23 17:23:56 clusterpaghe02 cib: [3398]: info: cib_diff_notify: Local-only Change (client:3402, call: 30): 0.82.1606 (ok)
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: info: See FAQ for information on tuning deadtime.
Oct 23 17:23:56 clusterpaghe02 pingd: [3405]: notice: pingd_nstatus_callback: Status update: Ping node clusterpaghe01 now has status [active]
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: info: URL: http://linux-ha.org/FAQ#heavy_load
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: info: Link clusterpaghe01:eth0 up.
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: WARN: Late heartbeat: Node clusterpaghe01: interval 111040 ms
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: info: Status update for node clusterpaghe01: status active
Oct 23 17:23:56 clusterpaghe02 tengine: [3840]: info: te_update_diff: Processing diff (cib_update): 0.82.1606 -> 0.82.1606
Oct 23 17:23:56 clusterpaghe02 haclient: on_event:evt:cib_changed
Oct 23 17:23:56 clusterpaghe02 cib: [3912]: info: write_cib_contents: Wrote version 0.82.1606 of the CIB to disk (digest: 4bd49f2e30ab61a2cbc494b1403513a5)
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: info: all clients are now paused
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: debug: hist->ackseq =252
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: debug: hist->lowseq =251, hist->hiseq=353
Oct 23 17:23:56 clusterpaghe02 heartbeat: [3188]: debug:
Oct 23 17:23:57 clusterpaghe02 heartbeat: [3188]: info: all clients are now resumed
Oct 23 17:23:57 clusterpaghe02 ccm: [3397]: debug: quorum plugin: majority
Oct 23 17:23:57 clusterpaghe02 ccm: [3397]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100
Oct 23 17:23:57 clusterpaghe02 cib: [3398]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Oct 23 17:23:57 clusterpaghe02 ccm: [3397]: debug: total_node_count=2, total_quorum_votes=200
Oct 23 17:23:57 clusterpaghe02 cib: [3398]: info: mem_handle_event: no mbr_track info
Oct 23 17:23:57 clusterpaghe02 ccm: [3397]: debug: quorum plugin: twonodes
Oct 23 17:23:57 clusterpaghe02 ccm: [3397]: debug: cluster:linux-ha, member_count=1, member_quorum_votes=100
Oct 23 17:23:57 clusterpaghe02 ccm: [3397]: debug: total_node_count=2, total_quorum_votes=200
Oct 23 17:23:57 clusterpaghe02 ccm: [3397]: info: Break tie for 2 nodes cluster
Oct 23 17:23:57 clusterpaghe02 crmd: [3402]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Oct 23 17:23:57 clusterpaghe02 cib: [3398]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Oct 23 17:23:57 clusterpaghe02 crmd: [3402]: info: mem_handle_event: no mbr_track info
Oct 23 17:23:57 clusterpaghe02 cib: [3398]: info: mem_handle_event: instance=1, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
Oct 23 17:23:57 clusterpaghe02 crmd: [3402]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Oct 23 17:23:57 clusterpaghe02 cib: [3398]: info: cib_ccm_msg_callback: PEER: clusterpaghe02
Oct 23 17:23:57 clusterpaghe02 crmd: [3402]: info: mem_handle_event: instance=1, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
Oct 23 17:23:57 clusterpaghe02 crmd: [3402]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=1)
Oct 23 17:23:57 clusterpaghe02 crmd: [3402]: info: ccm_event_detail: NEW MEMBERSHIP: trans=1, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
Oct 23 17:23:57 clusterpaghe02 crmd: [3402]: info: ccm_event_detail:    CURRENT: clusterpaghe02 [nodeid=1, born=1]
Oct 23 17:23:58 clusterpaghe02 ccm: [3397]: debug: quorum plugin: majority
Oct 23 17:23:58 clusterpaghe02 ccm: [3397]: debug: cluster:linux-ha, member_count=2, member_quorum_votes=200
Oct 23 17:23:58 clusterpaghe02 ccm: [3397]: debug: total_node_count=2, total_quorum_votes=200
Oct 23 17:23:58 clusterpaghe02 attrd: [3401]: info: attrd_ha_callback: flush message from clusterpaghe01
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Oct 23 17:23:58 clusterpaghe02 cib: [3398]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: mem_handle_event: no mbr_track info
Oct 23 17:23:58 clusterpaghe02 cib: [3398]: info: mem_handle_event: no mbr_track info
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Oct 23 17:23:58 clusterpaghe02 cib: [3398]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: mem_handle_event: instance=2, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Oct 23 17:23:58 clusterpaghe02 cib: [3398]: info: mem_handle_event: instance=2, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=2)
Oct 23 17:23:58 clusterpaghe02 cib: [3398]: info: cib_ccm_msg_callback: PEER: clusterpaghe02
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: ccm_event_detail: NEW MEMBERSHIP: trans=2, nodes=2, new=1, lost=0 n_idx=0, new_idx=2, old_idx=4
Oct 23 17:23:58 clusterpaghe02 cib: [3398]: info: cib_ccm_msg_callback: PEER: clusterpaghe01
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: ccm_event_detail:    CURRENT: clusterpaghe02 [nodeid=1, born=1]
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: ccm_event_detail:    CURRENT: clusterpaghe01 [nodeid=0, born=2]
Oct 23 17:23:58 clusterpaghe02 crmd: [3402]: info: ccm_event_detail:    NEW:     clusterpaghe01 [nodeid=0, born=2]
Oct 23 17:23:58 clusterpaghe02 attrd: [3401]: info: attrd_ha_callback: Sent update 8: pingd=200000
Oct 23 17:23:58 clusterpaghe02 cib: [3913]: info: write_cib_contents: Wrote version 0.82.1606 of the CIB to disk (digest: aea97883950214d08caaf981c55c1392)
Oct 23 17:23:59 clusterpaghe02 cib: [3398]: WARN: cib_process_diff: Diff 0.81.1604 -> 0.81.1605 not applied to 0.82.1606: current "epoch" is greater than required
Oct 23 17:23:59 clusterpaghe02 cib: [3398]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed
Oct 23 17:23:59 clusterpaghe02 cib: [3398]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed
Oct 23 17:23:59 clusterpaghe02 cib: [3398]: WARN: cib_process_diff: Diff 0.81.1605 -> 0.81.1606 not applied to 0.82.1606: current "epoch" is greater than required
Oct 23 17:23:59 clusterpaghe02 cib: [3398]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed
Oct 23 17:23:59 clusterpaghe02 cib: [3398]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed

Oct 23 17:30:01 clusterpaghe02 cib: [3398]: info: cib_stats: Processed 51 operations (5882.00us average, 0% utilization) in the last 10min

clusterpaghe02:~ #


More information about the Linux-HA mailing list