[Linux-HA] drbddisk error

Dejan Muhamedagic dejanmm at fastmail.fm
Mon Jun 9 06:06:18 MDT 2008


Hi,

On Mon, Jun 09, 2008 at 09:56:33AM +0200, Mario Vittorio Guenzi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi All,
> I prepared a small cluster with drbd and heartbeat with these
> characteristics and configurations
> Distribution Debian Etch with a minimum possible backports packets
> kernel 2.6.25
> drbd 8.0.11 built in kernel (no module)
> heartbeat 2.1.3
> the two machines are alessandra (master) lalla (slave)
> 
[snip]
> But if I unplug the network cable to simulate a fault in slave syslog I
> found this:
> 
> lalla:/# May 30 09:32:16 lalla heartbeat: [3018]: WARN: node
> alessandra: is dead
> May 30 09:32:16 lalla heartbeat: [3018]: WARN: No STONITH device
> configured.
> May 30 09:32:16 lalla heartbeat: [3018]: WARN: Shared disks are not
> protected.
> May 30 09:32:16 lalla heartbeat: [3018]: info: Resources being
> acquired from alessandra.
> May 30 09:32:16 lalla heartbeat: [3018]: info: Link alessandra:eth1 dead.
> May 30 09:32:16 lalla heartbeat: [8063]: debug: notify_world: setting
> SIGCHLD Handler to SIG_DFL
> May 30 09:32:16 lalla harc[8063]: info: Running /etc/ha.d/rc.d/status
> status
> May 30 09:32:16 lalla heartbeat: [8064]: info: No local resources
> [/usr/share/heartbeat/ResourceManager listkeys lalla] to acquire.
> May 30 09:32:16 lalla heartbeat: [3018]: debug:
> StartNextRemoteRscReq(): child count 1
> May 30 09:32:16 lalla mach_down[8092]: info: Taking over resource
> group IPaddr::192.168.2.247/24/eth1
> May 30 09:32:16 lalla ResourceManager[8118]: info: Acquiring resource
> group: alessandra IPaddr::192.168.2.247/24/eth1 drbddisk::r0
> Filesystem::/dev/drbd0::/cache::ext3::defaults drbddisk::r1
> Filesystem::/dev/drbd1::/jumper::ext3::defaults bind9
> MailTo::mario.guenzi at oceano.lan::Transizione_lalla_alessandra
> May 30 09:32:16 lalla IPaddr[8145]: INFO: Resource is stopped
> May 30 09:32:16 lalla ResourceManager[8118]: info: Running
> /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start
> May 30 09:32:16 lalla ResourceManager[8118]: debug: Starting
> /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start
> May 30 09:32:17 lalla IPaddr[8243]: INFO: Using calculated netmask for
> 192.168.2.247: 255.255.255.0
> May 30 09:32:17 lalla IPaddr[8243]: DEBUG: Using calculated broadcast
> for 192.168.2.247: 192.168.2.255
> May 30 09:32:17 lalla IPaddr[8243]: INFO: eval ifconfig eth1:0
> 192.168.2.247 netmask 255.255.255.0 broadcast 192.168.2.255
> May 30 09:32:17 lalla IPaddr[8243]: DEBUG: Sending Gratuitous Arp for
> 192.168.2.247 on eth1:0 [eth1]
> May 30 09:32:17 lalla IPaddr[8214]: INFO: Success
> May 30 09:32:17 lalla ResourceManager[8118]: debug:
> /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start done. RC=0
> May 30 09:32:17 lalla ResourceManager[8118]: info: Running
> /etc/ha.d/resource.d/drbddisk r0 start
> May 30 09:32:17 lalla ResourceManager[8118]: debug: Starting
> /etc/ha.d/resource.d/drbddisk r0 start
> May 30 09:32:29 lalla ResourceManager[8118]: debug:
> /etc/ha.d/resource.d/drbddisk r0 start done. RC=1
> May 30 09:32:29 lalla ResourceManager[8118]: ERROR: Return code 1 from
> /etc/ha.d/resource.d/drbddisk
> May 30 09:32:29 lalla ResourceManager[8118]: CRIT: Giving up resources
> due to failure of drbddisk::r0
> May 30 09:32:29 lalla ResourceManager[8118]: info: Releasing resource
> group: alessandra IPaddr::192.168.2.247/24/eth1 drbddisk::r0
> Filesystem::/dev/drbd0::/cache::ext3::defaults drbddisk::r1
> Filesystem::/dev/drbd1::/jumper::ext3::defaults bind9
> MailTo::mario.guenzi at oceano.lan::Transizione_lalla_alessandra
> May 30 09:32:30 lalla ResourceManager[8118]: info: Running
> /etc/ha.d/resource.d/MailTo mario.guenzi at oceano.lan
> <mailto:mario.guenzi at oceano.lan> Transizione_lalla_alessandra stop
> May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting
> /etc/ha.d/resource.d/MailTo mario.guenzi at oceano.lan
> <mailto:mario.guenzi at oceano.lan> Transizione_lalla_alessandra stop
> May 30 09:32:30 lalla MailTo[8431]: INFO: Success
> May 30 09:32:30 lalla ResourceManager[8118]: debug:
> /etc/ha.d/resource.d/MailTo mario.guenzi at oceano.lan
> <mailto:mario.guenzi at oceano.lan> Transizione_lalla_alessandra stop
> done. RC=0
> May 30 09:32:30 lalla ResourceManager[8118]: info: Running
> /etc/init.d/bind9 stop
> May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting
> /etc/init.d/bind9 stop
> May 30 09:32:30 lalla ResourceManager[8118]: debug: /etc/init.d/bind9
> stop done. RC=0
> May 30 09:32:30 lalla ResourceManager[8118]: info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop
> May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting
> /etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop
> May 30 09:32:30 lalla Filesystem[8531]: INFO: Running stop for
> /dev/drbd1 on /jumper
> May 30 09:32:31 lalla Filesystem[8520]: INFO: Success
> May 30 09:32:31 lalla ResourceManager[8118]: debug:
> /etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop
> done. RC=0
> May 30 09:32:31 lalla ResourceManager[8118]: info: Running
> /etc/ha.d/resource.d/drbddisk r1 stop
> May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting
> /etc/ha.d/resource.d/drbddisk r1 stop
> May 30 09:32:31 lalla ResourceManager[8118]: debug:
> /etc/ha.d/resource.d/drbddisk r1 stop done. RC=0
> May 30 09:32:31 lalla ResourceManager[8118]: info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop
> May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop
> May 30 09:32:31 lalla Filesystem[8641]: INFO: Running stop for
> /dev/drbd0 on /cache
> May 30 09:32:31 lalla Filesystem[8630]: INFO: Success
> May 30 09:32:31 lalla ResourceManager[8118]: debug:
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop
> done. RC=0
> May 30 09:32:31 lalla ResourceManager[8118]: info: Running
> /etc/ha.d/resource.d/drbddisk r0 stop
> May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting
> /etc/ha.d/resource.d/drbddisk r0 stop
> May 30 09:32:31 lalla ResourceManager[8118]: debug:
> /etc/ha.d/resource.d/drbddisk r0 stop done. RC=0
> May 30 09:32:31 lalla ResourceManager[8118]: info: Running
> /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop
> May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting
> /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop
> May 30 09:32:32 lalla IPaddr[8769]: INFO: ifconfig eth1:0 down
> May 30 09:32:32 lalla IPaddr[8740]: INFO: Success
> May 30 09:32:32 lalla ResourceManager[8118]: debug:
> /etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop done. RC=0
> May 30 09:32:32 lalla mach_down[8092]: info:
> /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
> May 30 09:32:32 lalla mach_down[8092]: info: mach_down takeover
> complete for node alessandra.
> May 30 09:32:32 lalla heartbeat: [3018]: info: mach_down takeover
> complete.
> May 30 09:33:02 lalla hb_standby[8823]: Going standby [foreign].
> May 30 09:33:02 lalla heartbeat: [3018]: info: lalla wants to go
> standby [foreign]
> May 30 09:33:13 lalla heartbeat: [3018]: WARN: No reply to standby
> request. Standby request cancelled.
> 
> and obviously does not start anything.
> As it returns error "1",(lalla ResourceManager[8118]: ERROR: Return code
> 1 from /etc/ha.d/resource.d/drbddisk) I tried to search with google and
> documentation drbd but I could not find anything, can someone please
> explain what I do wrong?

You really need stonith. Your test created the split-brain
situation. No way to recover from that in a sane way without
stonith. Worse, you could easily loose data on the shared disk.
Furthermore, you need at least two heartbeat media.

Thanks,

Dejan

> thanks in advance and best regards
> 
> - --
> Mario Vittorio Guenzi
> 
> http://clark.tipistrani.it
> 
> Si vis pacem para bellum
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFITOIxm6qs1ZkNrIoRAsoeAJ4ysowBiABtWkJsgTQbw7rS3+rAJACeMH4z
> pZqRRuDVuBDQ8TMWh285XFg=
> =7Nnj
> -----END PGP SIGNATURE-----
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


More information about the Linux-HA mailing list