[Linux-HA] DBRD - split brain - and HA is happily migrating

Dominik Klein dk at in-telegence.net
Wed Jan 2 01:58:28 MST 2008


> Thanks for your help. It looks like everything works as desired:
> 
> (postgres-02) [~] ifconfig eth1 down
> (postgres-02) [~] cat /proc/drbd
> version: 8.2.1 (api:86/proto:86-87)
> GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by tg at apache-01, 2007-12-29 17:37:25
>  0: cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown C r---
>     ns:60499852 nr:713732 dw:713732 dr:60499852 al:0 bm:3693 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:3777806 misses:3724 starving:0 dirty:0 changed:3724
>         act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
> (postgres-02) [~] ifconfig eth1 up
> (postgres-02) [~] cat /proc/drbd
> version: 8.2.1 (api:86/proto:86-87)
> GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by tg at apache-01, 2007-12-29 17:37:25
>  0: cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown C r---
>     ns:60499852 nr:713732 dw:713732 dr:60499852 al:0 bm:3693 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:3777806 misses:3724 starving:0 dirty:0 changed:3724
>         act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
> (postgres-02) [~] cat /proc/drbd
> version: 8.2.1 (api:86/proto:86-87)
> GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by tg at apache-01, 2007-12-29 17:37:25
>  0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
>     ns:60499852 nr:715292 dw:715292 dr:60499852 al:0 bm:3705 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:3777942 misses:3736 starving:0 dirty:0 changed:3736
>         act_log: used:0/127 hits:0 misses:0 starving:0 dirty:0 changed:0
> 
> When I put the crosslink link down, the disk on postgres-02 gets outdated, if I
> put it back on, it syncs in no-time.

You should be aware of one thing though:

If you have a DRBD splitbrain now and your primary crashes whilst in 
splitbrain, heartbeat will never be able to start your resource on the 
secondary node, as the DRBD resource is outdated. Read: Your resource 
will not run at all, not even with old data.

You will have to manually do something like "drbdadm -- 
--overwrite-data-of-peer primary $resource" to get the device into 
primary state on an outdated disconnected secondary.

When the crashed primary comes back, you need to "drbdadm -- 
--discard-my-data connect $resource" (on the crashed primary) to get it 
in sync again - heartbeat is not able to do that on its own (Which is 
good. It shouldnt know a way to force a device to primary state).

Regards
Dominik


More information about the Linux-HA mailing list