[Linux-HA] Fisrt Time HA
Raoul Bhatia [IPAX]
r.bhatia at ipax.at
Mon Oct 8 14:14:52 MDT 2007
pfu,
just read throu your email with all the information, just to conclude
that i cannot help that much as i'm using the new v2 style configuraiton
(which is cib.xml and ocf ressource agents)
perhaps theres somebody else who can help you with your problem.
if you try to use the xml based configuration method, feel free to ask
me about my setup!
cheers,
raoul bhatia
Peter Petroff wrote:
> Doing first ever install of drbd + heartbeat.
>
> Having issue with DRBD. Any help would be great.
> Lets say I lose power to Primary, the secondary will not become primary
> so then my haresources fail.
> This is the status before I pull the power on trixbox1.local
> SVN Revision: 2947 build by buildsvn at c5-i386-build, 2007-09-29 06:28:57
> 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>
> Here is a power loss to trixbox1.local
>
> Oct 8 12:38:48 trixbox2 heartbeat: [2400]: info: Link
> trixbox1.local:eth1 dead.
> Oct 8 12:38:48 trixbox2 heartbeat: [2895]: info: No local resources
> [/usr/share/heartbeat/ResourceManager listkeys trixbox2.local] to
> acquire.
> Oct 8 12:38:48 trixbox2 harc[2894]: info: Running /etc/ha.d/rc.d/status
> status
> Oct 8 12:38:48 trixbox2 kernel: tg3: eth1: Link is up at 100 Mbps, full
> duplex.
> Oct 8 12:38:48 trixbox2 kernel: tg3: eth1: Flow control is off for TX
> and off for RX.
> Oct 8 12:38:48 trixbox2 mach_down[2923]: info: Taking over resource
> group xxx.xxx.xxx.xxx/27/eth0
> Oct 8 12:38:48 trixbox2 ResourceManager[2949]: info: Acquiring resource
> group: trixbox1.local xxx.xxx.xxx.xxx/27/eth0 drbddisk::drbd0
> Filesystem::/dev/drbd0::/share::ext3 mysqld sendmail asterisk httpd ircd
> xinetd
> Oct 8 12:38:48 trixbox2 IPaddr[2976]: INFO: Resource is stopped
> Oct 8 12:38:48 trixbox2 ResourceManager[2949]: info: Running
> /etc/ha.d/resource.d/IPaddr xxx.xxx.xxx.xxx/27/eth0 start
> Oct 8 12:38:48 trixbox2 IPaddr[3076]: INFO: Using calculated netmask
> for xxx.xxx.xxx.xxx: 255.255.255.224
> Oct 8 12:38:48 trixbox2 IPaddr[3076]: INFO: eval ifconfig eth0:0
> xxx.xxx.xxx.xxx netmask 255.255.255.224 broadcast 70.97.159.127
> Oct 8 12:38:48 trixbox2 IPaddr[3047]: INFO: Success
> Oct 8 12:38:48 trixbox2 ResourceManager[2949]: info: Running
> /etc/ha.d/resource.d/drbddisk drbd0 start
> Oct 8 12:38:49 trixbox2 kernel: drbd0: PingAck did not arrive in time.
> Oct 8 12:38:49 trixbox2 kernel: drbd0: peer( Secondary -> Unknown )
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> Oct 8 12:38:49 trixbox2 kernel: drbd0: asender terminated
> Oct 8 12:38:49 trixbox2 kernel: drbd0: disk( UpToDate -> Outdated )
> Oct 8 12:38:49 trixbox2 kernel: drbd0: outdate-peer helper broken,
> returned 0
> Oct 8 12:38:49 trixbox2 kernel: drbd0: State change failed: Refusing to
> be Primary without at least one UpToDate disk
> Oct 8 12:38:49 trixbox2 kernel: drbd0: state = { cs:NetworkFailure
> st:Secondary/Unknown ds:Outdated/DUnknown r--- }
> Oct 8 12:38:49 trixbox2 kernel: drbd0: wanted = { cs:NetworkFailure
> st:Primary/Unknown ds:Outdated/DUnknown r--- }
> Oct 8 12:38:49 trixbox2 kernel: drbd0: short read expecting header on
> sock: r=-512
> Oct 8 12:38:49 trixbox2 kernel: drbd0: tl_clear()
> Oct 8 12:38:49 trixbox2 kernel: drbd0: Connection closed
> Oct 8 12:38:49 trixbox2 kernel: drbd0: Writing meta data super block
> now.
> Oct 8 12:38:49 trixbox2 kernel: drbd0: conn( NetworkFailure ->
> Unconnected )
> Oct 8 12:38:49 trixbox2 kernel: drbd0: receiver terminated
> Oct 8 12:38:49 trixbox2 kernel: drbd0: receiver (re)started
> Oct 8 12:38:49 trixbox2 kernel: drbd0: conn( Unconnected ->
> WFConnection )
> Oct 8 12:38:50 trixbox2 kernel: drbd0: State change failed: Refusing to
> be Primary without at least one UpToDate disk
> Oct 8 12:38:50 trixbox2 kernel: drbd0: state = { cs:WFConnection
> st:Secondary/Unknown ds:Outdated/DUnknown r--- }
> Oct 8 12:38:50 trixbox2 kernel: drbd0: wanted = { cs:WFConnection
> st:Primary/Unknown ds:Outdated/DUnknown r--- }
> Oct 8 12:38:54 trixbox2 ResourceManager[2949]: CRIT: Giving up
> resources due to failure of drbddisk::drbd0
> Oct 8 12:38:54 trixbox2 ResourceManager[2949]: info: Releasing resource
> group: trixbox1.local xxx.xxx.xxx.xxx/27/eth0 drbddisk::drbd0
> Filesystem::/dev/drbd0::/share::ext3 mysqld sendmail asterisk httpd ircd
> xinetd
>
> Here is my setup. Step By step.
>
> 2 identical boxes
> Trixbox1.local eth0 publicip eth1 10.0.10.2 160gb
> Trixbox2.local eth0 publicip eth1 10.0.10.3 2x160gb hwRAID1
> eth1 via crossover gigabit nic
> Both are set up as
> /boot 101mb
> / 75000mb
> /drbd075000mb
> /swap 2048mb
>
> OS: CentOS 5 Final 2.6.18-8.1.14.el5 SMP
> Drbd 8.0.4-1.el5
> Kmod-Drbd 8.0.4-1.2.6.18_8.1.14.el5
>
>
> /etc/drbd.conf on both
> global {
> dialog-refresh 5; # 5 seconds
> usage-count yes;
> }
> common {
> syncer { rate 120M; }
> }
> resource drbd0 {
> protocol C;
>
> handlers {
> pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
> pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
> local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
> }
>
> startup {
> wfc-timeout 60;
> degr-wfc-timeout 120; # 2 minutes.
> }
>
> disk {
> on-io-error detach;
> fencing resource-and-stonith;
> }
>
> net {
> after-sb-0pri disconnect;
> after-sb-1pri disconnect;
> after-sb-2pri disconnect;
> rr-conflict call-pri-lost;
> }
>
> syncer {
> rate 120M;
> }
>
> on trixbox1.local {
> device /dev/drbd0;
> disk /dev/sda4;
> address 10.0.10.2:7788;
> flexible-meta-disk internal;
> }
>
> on trixbox2.local {
> device /dev/drbd0;
> disk /dev/mapper/hpt45x_dejfaehcfp4;
> address 10.0.10.3:7788;
> meta-disk internal;
> }
> }
> -------------------------------------------------------------
>
> Reboot
>
> On trixbox1.local drbdadm -- --overwrite-data-of-peer primary all
> mkfs.ext3 /dev/drbd0
> trixbox 1 & 2 /etc/fstab /dev/drbd0 /share ext3 noauto 0 0
> trixbox 1 & 2 mkdir /share
>
> RESULTS:
> [trixbox1.local ~]# cat /proc/drbd
> version: 8.0.4 (api:86/proto:86)
> SVN Revision: 2947 build by buildsvn at c5-i386-build, 2007-09-29 06:28:57
> 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
> ns:75189868 nr:0 dw:1340508 dr:73849428 al:1138 bm:4665 lo:0 pe:0
> ua:0 ap:0
> resync: used:0/31 hits:4736102 misses:4940 starving:0 dirty:0
> changed:4940
> act_log: used:0/127 hits:333989 misses:2974 starving:2
> dirty:1834 changed:1138
>
> on both yum install heartbeat -y
>
> Installed: heartbeat.i386 0:2.1.2-3.el5.centos
> Dependency Installed: heartbeat-pils.i386 0:2.1.2-3.el5.centos
> heartbeat-stonith.i386 0:2.1.2-3.el5.centos
> openhpi.i386 0:2.4.1-6.el5.1
>
> on both /etc/ha.d/authkeys
> auth 1
> 1 crc
> On both Chmod 600 authkeys
>
> On both /etc/ha.d/ha.cf
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility local0
> keepalive 200ms
> deadtime 2
> warntime 1
> initdead 120
> udpport 694
> bcast eth1
> auto_failback on
> node trixbox1.local
> node trixbox2.local
>
> on both /etc/ha.d/haresources
> trixbox1.local xxx.xxx.xxx.xxx/27/eth0 drbddisk::drbd0
> Filesystem::/dev/drbd0::/share::ext3 mysqld sendmail asterisk httpd ircd
> xinetd
>
> on both did the following
>
> chkconfig --levels 345 mysqld off
> chkconfig --levels 345 sendmail off
> chkconfig --levels 345 asterisk off
> chkconfig --levels 345 httpd off
> chkconfig --levels 345 ircd off
> chkconfig --levels 345 xinetd off
> chkconfig --levels 345 heartbeat on
> service mysqld stop
> service sendmail stop
> service asterisk stop
> service httpd stop
> service ircd stop
> service xinetd stop
> service heartbeat start
>
>
>
>
>
>
>
>
>
> Peter Petroff
> Sr. Systems Engineer
> 208-287-5524
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia at ipax.at
Technischer Leiter
IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office at ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
More information about the Linux-HA
mailing list