[Linux-HA] Heartbeat doesn't start resources on proper node after DRBD syncing.

Andrew Beekhof beekhof at gmail.com
Fri Oct 5 04:32:16 MDT 2007


On 10/3/07, Michael <misha at onet.ru> wrote:
> Hello, colleagues!
>
> My system is up-to-date Gentoo world, manually installed heartbeat
> 2.1.2 and unmasked drbd 8.0.6, apache 2.2.6 and mysql 5.0.44 from
> portage.
>
> I can't use 0.7.x version of DRBD because it consist bug which i face:
> nodes become in WFBitMapS state and freeze. Searching in drbd mailing
> list gave me developer post that this problem is fixed in 8 series.

either this is becoming a common theme or i was speaking to you
yesterday on irc...

the OCF agent that comes with heartbeat does not understand how to use
drbd8 (apparently the strings it looks for have changed)

so you'd either need to fix the RA (and hopefully send us a patch) or
use the LSB script that comes drbd.  not being a drbd user myself
thats about as much info as I can offer.

> I'm trying to setup HA cluster for web-server with two nodes. To
> accomplish this task i'm using drbd to get real-time data replication,
> mysql, apache and shared IP address.
>
> According to documentation i created the cluster where i have:
> - Master\Slave DRBD resource
> - Filesystem which is started on DRBD Master promoted node
> - Web group with IP, Apache and mySQL, which is started after
> Filesystem and on the same node.
>
> When primary node is falling, second one gets its ip and starts all
> resources, when primary system is back it's desired for secondary node
> to give all resources back and sync all changes in filesystem with
> drbd (this is because primary node is twice more powerful).
>
> When I start heartbeat on both nodes everything is fine, primary one
> gets Master status in Master\Slave drbd resource, then, I shutdown
> primary node, all resources is setting up on Secondary node (all is
> fine).
>
> Then I copy 300mb files to drbd filesystem (i simulate changes in web
> environment while primary node is down) and start primary node again,
> all resources on Secondary node gets shot down (which i expect).
>
> That is the place where problem happen, drbd need some time to
> syncronize filesytem (state become SyncSource), after sync is done (we
> have Gb link between the servers, so it's rather fast) i expect my
> Primary node take the Master status of drbd resource, but it doesn't
> happen, what i see is that Secondary node is still Master and after
> timeouts all resources get started on Secondary again.
>
> By the way, if i won't copy large files inside drbd FS all resources
> will succesfully migrate to Primary.
>
> Uname of my primary node is "tolstoy", and secondary is "lermontov".
>
> Below is my cib.xml and logs from secondary (lermontov) node after
> Primary (tolstoy) is getting online again after shutdown.
>
> Thanks a lot for any kind of help and sorry for my language, i'll
> gladly explain anything obscure in my mail.
>
> This is my cib.xml:
>


More information about the Linux-HA mailing list