[Linux-HA] Setup cluster

Ruiyuan Jiang Ruiyuan_Jiang at liz.com
Thu Feb 18 15:51:30 MST 2010


Thanks, Andreas. That is what I suspected too. Once stonith disabled, the cluster starts. 
I have not tried to set quorum yet. I will try next.
Now I have another problem. Apache does not start but virtual IP address bonded to the NIC.

[root at usnbrl51 ~]# crm configure show
node $id="01428973-0d27-48ee-9142-2da9cb5c1e4b" usnbrl50.liz.com
node $id="0988f7f3-c858-4c12-af3b-b6a8bf83a0ae" usnbrl51.liz.com
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="156.146.22.48" cidr_netmask="32" \
        op monitor interval="30s"
primitive WebSite ocf:heartbeat:apache \
        params configfile="/etc/httpd/conf/httpd.conf" \
        op monitor interval="1min"
location prefer-usnbrl50 WebSite 50: usnbrl50
colocation website-with-ip inf: WebSite ClusterIP
order apache-after-ip inf: ClusterIP WebSite
property $id="cib-bootstrap-options" \
        dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \
        cluster-infrastructure="Heartbeat" \
        no-quorum-policy="ignore" \
        stonith-enabled="false"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"
[root at usnbrl51 ~]#

Last updated: Thu Feb 18 17:28:14 2010
Stack: Heartbeat
Current DC: usnbrl51.liz.com (0988f7f3-c858-4c12-af3b-b6a8bf83a0ae) - partition WITHOUT quorum
Version: 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d
2 Nodes configured, unknown expected votes
2 Resources configured.
Online: [ usnbrl51.liz.com usnbrl50.liz.com ]

ClusterIP	(ocf::heartbeat:IPaddr2 Started usnbrl50.liz.com
    WebSite_start_0 (node=usnbrl51.liz.com, call=6, rc=1, status=complete): unknown error
    WebSite_start_0 (node=usnbrl50.liz.com, call=6, rc=1, status=complete): unknown error

In the Apache's error log, it shows "caught SIGTERM, shuting down".

On the /var/log/messages, it does not say why Apache can't start also. I can manually start Apache no problem.


Feb 18 16:42:51 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM operation Cl
usterIP_start_0 (call=4, rc=0, cib-update=15, confirmed=true) ok
Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing key=9:4:0
:ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=ClusterIP_monitor_30000 )
Feb 18 16:42:53 usnbrl50 lrmd: [3607]: info: rsc:ClusterIP:5: monitor
Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing key=10:4:
0:ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=WebSite_start_0 )
Feb 18 16:42:53 usnbrl50 lrmd: [3607]: info: rsc:WebSite:6: start
Feb 18 16:42:53 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM operation ClusterIP_monitor_30000 (call=5, rc=0, cib-update=16, confirmed=false) ok
Feb 18 16:42:53 usnbrl50 apache[3754]: [3818]: INFO: apache not running
Feb 18 16:42:53 usnbrl50 apache[3754]: [3820]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
Feb 18 16:42:54 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM operation WebSite_start_0 (call=6, rc=1, cib-update=17, confirmed=true) unknown error
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_ha_callback: Update relayed from usnbrl51.liz.com
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-WebSite (INFINITY)
Feb 18 16:42:55 usnbrl50 crmd: [3610]: info: do_lrm_rsc_op: Performing key=2:5:0:ea164eb4-9fea-4d79-83f6-0ad29f8521a5 op=WebSite_stop_0 )
Feb 18 16:42:55 usnbrl50 lrmd: [3607]: info: rsc:WebSite:7: stop
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_perform_update: Sent update 16: fail-count-WebSite=INFINITY
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_ha_callback: Update relayed from usnbrl51.liz.com
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-WebSite (1266529375)
Feb 18 16:42:55 usnbrl50 attrd: [3609]: info: attrd_perform_update: Sent update 19: last-failure-WebSite=1266529375
Feb 18 16:42:55 usnbrl50 lrmd: [3607]: info: RA output: (ClusterIP:start:stderr)
ARPING 192.168.9.101 from 192.168.9.101 eth0 Sent 5 probes (5 broadcast(s)) Received 0 response(s)
Feb 18 16:42:56 usnbrl50 lrmd: [3607]: info: RA output: (WebSite:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/apache: line 437: kill: (3816) - No such process
Feb 18 16:42:56 usnbrl50 apache[3842]: [3876]: INFO: Killing apache PID 3816
Feb 18 16:42:56 usnbrl50 apache[3842]: [3878]: INFO: apache stopped.
Feb 18 16:42:56 usnbrl50 crmd: [3610]: info: process_lrm_event: LRM operation WebSite_stop_0 (call=7, rc=0, cib-update=18, confirmed=true) ok
Feb 18 16:44:08 usnbrl50 pengine: [4004]: info: crm_log_init: Changed active directory to /usr/local/var/lib/heartbeat/cores/root
Feb 18 16:44:08 usnbrl50 pengine: [4004]: info: Invoked: /usr/local/lib64/heartbeat/pengine metadata


Thanks.
Ryan

-----Original Message-----
From: linux-ha-bounces at lists.linux-ha.org [mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of Andreas Kurz
Sent: Thursday, February 18, 2010 7:23 AM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Setup cluster

On Thursday 18 February 2010 12:55:50 Dejan Muhamedagic wrote:
> Hi,
> 
> On Wed, Feb 17, 2010 at 12:15:38PM -0500, Ruiyuan Jiang wrote:
> > Hi,
> >
> > I am trying to setup my first cluster on Redhat Enterprise Server v5.4.
> > Currently there is no disks for both of my hosts yet. The version that I
> > have is heartbeat 3.0.2.
> >
> > I did the command:
> >
> > # crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \
> > params ip=192.168.9.101 cidr_netmask=32 \
> > op monitor interval=30s
> >
> > # crm_mon
> > Last updated: Wed Feb 17 12:07:22 2010
> > Stack: Heartbeat
> > Current DC: usnbrl51.liz.com (0988f7f3-c858-4c12-af3b-b6a8bf83a0ae) -
> > partition WITHOUT quorum Version:
> > 1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d
> > 2 Nodes configured, unknown expected votes
> > 1 Resources configured.
> > Online: [ usnbrl51.liz.com usnbrl50.liz.com ]
> >
> >
> > From the output, I ignore " unknown expected votes". Is it safe
> > to ignore " partition WITHOUT quorum" for now? Also I don't see
> > my cluster IP.
> 
> Don't think it matters, but you could set it anyway:
> 
> property expected-quorum-votes="2"
> 
> > # crm configure show
> > node $id="01428973-0d27-48ee-9142-2da9cb5c1e4b" usnbrl50.liz.com
> > node $id="0988f7f3-c858-4c12-af3b-b6a8bf83a0ae" usnbrl51.liz.com
> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> >         params ip="192.168.9.101" cidr_netmask="32" \
> >         op monitor interval="30s"
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.0.6-fdba003eafa6af1b8d81b017aa535a949606ca0d" \
> >         cluster-infrastructure="Heartbeat" \
> >         no-quorum-policy="ignore"
> > #
> >
> > Above command shows that the cluster has the its virtual IP.
> 
> The resource is defined, but it's not running. Looks like the CRM
> didn't try to start it. The logs should show why.

no stonith-resources but stonith is enabled (default) --> resource management 
is disabled ... set the stonith-enable property to false and the resource 
should start

Regards,
Andreas

> 
> Thanks,
> 
> Dejan
> 
> > What I did wrong here? Thanks.
> >
> > Ryan
> >
> >
> >
> > This message (including any attachments) is intended
> > solely for the specific individual(s) or entity(ies) named
> > above, and may contain legally privileged and
> > confidential information. If you are not the intended
> > recipient, please notify the sender immediately by
> > replying to this message and then delete it.
> > Any disclosure, copying, or distribution of this message,
> > or the taking of any action based on it, by other than the
> > intended recipient, is strictly prohibited.
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



This message (including any attachments) is intended
solely for the specific individual(s) or entity(ies) named
above, and may contain legally privileged and
confidential information. If you are not the intended 
recipient, please notify the sender immediately by 
replying to this message and then delete it.
Any disclosure, copying, or distribution of this message,
or the taking of any action based on it, by other than the
intended recipient, is strictly prohibited.



More information about the Linux-HA mailing list