Fwd: [Linux-HA] Migrate to cib.xml - can't get resources to start

Andrew Beekhof beekhof at gmail.com
Wed Feb 14 02:11:56 MST 2007


forgot to CC the list for posterity

---------- Forwarded message ----------
From: Andrew Beekhof <beekhof at gmail.com>
Date: Feb 14, 2007 10:10 AM
Subject: Re: [Linux-HA] Migrate to cib.xml - can't get resources to start
To: Darren Hoch <darren.hoch at litemail.org>


On 2/13/07, Darren Hoch <darren.hoch at litemail.org> wrote:
> Andrew Beekhof wrote:
>
> > On 2/9/07, Darren Hoch <darren.hoch at litemail.org> wrote:
> >
> >> Hello All,
> >>
> >> Heartbeat Version: 2.0.6 (aware of security vulnerability) :)
> >> RHEL 4 U4 SMP
> >>
> >> I am attempting to move from haresources to cib.xml. My configuration is
> >> moving from haresoureces based Active/Passive to
> >> Active/Active/Passive. I
> >> have ha1, ha2, and ha3 nodes. Both ha1 and ha3 run a proprietary app
> >> (started by LSB script). The proprietary app runs on DRBD filesystem on
> >> both ha1 and ha3 in primary (ha2 secondary DRBD device). The ha2 node is
> >> Passive and acquires resources (IPAddr, DRBD, App) from ha1 or ha3 upon
> >> failure.
> >>
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility     local0
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 694
> bcast   eth0 eth1       # Linux
> auto_failback off
> crm yes
> node    ha1 ha2 ha3
> ping 192.168.1.1
> respawn hacluster /usr/lib/heartbeat/ipfail
>
> > I created the cib.xml using haresources2cib.py. I want resources to start
>
> >> without (ignore).
> >
> >
> > for that you'll need quorum - are at least 2 of the 3 nodes active?
> >
> No

ok, that pretty much explains why no resources are started

> and there is no DC election

that be a problem :(

>
> ha2 dies with:
>
> crmd[1560]: 2007/02/13_13:33:50 ERROR: do_ccm_control:ccm.c CCM
> Activation failed 30 (max) times
> crmd[1560]: 2007/02/13_13:33:50 WARN: do_log:misc.c [[FSA]] Input I_FAIL
> from do_ccm_control() received in state (S_STARTING)
> crmd[1560]: 2007/02/13_13:33:50 info: do_state_transition:fsa.c ha2:
> State transition S_STARTING -> S_STOPPING [ input=I_FAIL
> cause=C_FSA_INTERNAL
>  origin=do_ccm_control ]
> cib[1542]: 2007/02/13_13:35:46 WARN: cib_peer_callback:callbacks.c
> Discarding cib_slave_all message (b1) from ha3: not in our membership
>
> ha3 never comes online and I am assuming from my limited knowledge, the
> last line of the above output has something to do with it...:)

actually the first is more telling - we're not getting any membership
information :-/

looking in the logs i see:
ccm[1541]: 2007/02/13_13:33:34 ERROR: Node count from node ha3 does
not agree: local count=3, count in message=4
ccm[1541]: 2007/02/13_13:33:34 ERROR: Please make sure ha.cf files on
all nodes have same nodes list or add "autojoin any" to ha.cf

you might want to remove the /var/lib/heartbeat/hostcache file on all
machines (stop heartbeat on all the nodes first), re-check that ha.cf
is the same on all nodes, and then re-test.


More information about the Linux-HA mailing list