[Linux-HA] Heartbeat doesn't start resources on proper node after DRBD syncing.

Michael misha at onet.ru
Sun Oct 7 01:30:48 MDT 2007


On 10/5/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> either this is becoming a common theme or i was speaking to you
> yesterday on irc...
Yeah, you were talking exactly with me, thanks for your kindness ;)

> the OCF agent that comes with heartbeat does not understand how to use
> drbd8 (apparently the strings it looks for have changed)
>
> so you'd either need to fix the RA (and hopefully send us a patch) or
> use the LSB script that comes drbd.  not being a drbd user myself
> thats about as much info as I can offer.

Well, i did some investigation on that topic, since drbd is a vital
part of my system. Existing RA work pretty well with drbd8 (only real
issue with what i come up with it's it not aware of Unknown/TOO_LARGE
and SyncTarget state).
The fail to promote master only happens while drbd syncing large
amount of data (for example about 1Gb).
I was trying to reproduce that without heartbeat at all and what i
come up with, is that sometimes drbd don't want become a primary and
answer something like this:

tolstoy(7):/home/misha# drbdsetup /dev/drbd0 primary
No response from the DRBD driver! Is the module loaded?
State change failed: (0)unknown error.

Although syncing in process, and command get successful only after few
tries. Sometimes it gets promoted but on another side it's still in
old state.

So it's strange to expect from RA correct behaviour if drbd don't work
correct itself.

Anyway, i have a question about how heartbeat behaive: imagine a
situation we have Node1 comming up from shutdown and while it was off
Node2 has changed information on its drbd resource, they need to sync.
More on that, in cib.xml Node1 has an +INFINITY for being master.
While nodes are in Sync mode RA on Node1 will do crm_master -v 5 -l
reboot, and on Node2 will do crm_master -v 100 -l reboot.
As i understand promoted Node2 gets promoted, since 100>5. But sync
will finish some time, so the question is: will heartbeat repeat that
promotion on Node1?

Btw, yes, it's possible to set Primary on Node1 while it's in syncing
to Node2 (as i understand blocks which are marked dirty and requested
while sync will be retrieved from the net), but it's hard since the
problem described above, drbd not always want to be Primary and things
become to destroy :(


More information about the Linux-HA mailing list