[Linux-HA] Linux-HA Service Monitoring
Thomas Glanzmann
thomas at glanzmann.de
Thu Jan 3 07:09:05 MST 2008
Hello again,
so here is step by step guide. I just did it to stay in training. I used
Debian etch:
On both nodes:
/etc/init.d/heartbeat stop
rm /var/lib/heartbeat/crm/* (Don't do this in production it
wipes your configuration)
(ha-1) [~] cat /etc/ha.d/ha.cf
use_logd yes
bcast eth1
mcast eth0 239.0.0.2 694 1 0
node ha-1 ha-2
crm on
Note: eth0 is the network segment both nodes are on; eth1 is a crosslink
Copy the attached OCF Agent for squid on top of debian init
script to /usr/lib/ocf/resource.d/heartbeat/squid
I wrote that ressource agent because I could not find one for
squid.
chmod +x /usr/lib/ocf/resource.d/heartbeat/squid
apt-get install squid
Get sure squid isn't started by init:
update-rc.d -f squid remove (This is for Debian)
chkconfig squid off (That would be for SuSE / Red Hat)
/etc/init.d/heartbeat stop
On _one_ node load the attached XML Service Description:
cibadmin -U -x squid.xml (Don't forget to adopt your IP address)
Call "crm_mon -r -1" to see if squid gets started and on which node it
gets started:
crm_mon -r -1
On that node call:
/etc/init.d/squid stop
The monitor interval is set to 60 seconds. So linux-ha should detect
within 60 seconds that squid is gone and restart it. Linux-ha tries on
default first the node where it ran before. If that goes wrong for
whatever reason it tries the other node.
And now to the practice:
(ha-1) [~] crm_mon -1 -r
============
Last updated: Thu Jan 3 14:57:57 2008
Current DC: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb)
2 Nodes configured.
0 Resources configured.
============
Node: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb): online
Node: ha-1 (330da1b6-5f99-480a-b071-a144a98e1248): online
Full list of resources:
(ha-1) [~] cibadmin -U -x squid.xml
(ha-1) [~] crm_mon -1 -r
============
Last updated: Thu Jan 3 14:58:17 2008
Current DC: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb)
2 Nodes configured.
1 Resources configured.
============
Node: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb): online
Node: ha-1 (330da1b6-5f99-480a-b071-a144a98e1248): online
Full list of resources:
Resource Group: squid-cluster
ip0 (heartbeat::ocf:IPaddr2): Started ha-2
squid (heartbeat::ocf:squid): Started ha-2
Okay we loaded the config and squid is running on ha-2 so lets go to ha-2:
(ha-2) [~] date; ps axuww | grep squid
Thu Jan 3 14:58:38 CET 2008
root 13732 0.0 0.2 4628 672 ? Ss 14:58 0:00 /usr/sbin/squid -D -sYC
proxy 13734 0.0 1.7 6848 4572 ? S 14:58 0:00 (squid) -D -sYC
root 13753 0.0 0.2 3688 596 pts/0 R+ 14:58 0:00 grep --color=auto squid
(ha-2) [~] /etc/init.d/squid stop
Stopping Squid HTTP proxy: squid.
(ha-2) [~] date; ps axuww | grep squid
Thu Jan 3 14:58:48 CET 2008
root 13762 0.0 0.2 3692 600 pts/0 R+ 14:58 0:00 grep --color=auto squid
(ha-2) [~] date; ps axuww | grep squid
Thu Jan 3 14:59:04 CET 2008
root 13767 0.0 0.2 3688 596 pts/0 R+ 14:59 0:00 grep --color=auto squid
(ha-2) [~] date; ps axuww | grep squid
Thu Jan 3 14:59:10 CET 2008
root 13770 0.0 0.2 3688 596 pts/0 R+ 14:59 0:00 grep --color=auto squid
(ha-2) [~] date; ps axuww | grep squid
Thu Jan 3 14:59:15 CET 2008
root 13806 0.0 0.5 4648 1396 ? S 14:59 0:00 /bin/sh /usr/lib/ocf/resource.d//heartbeat/squid stop
root 13812 0.0 0.4 2536 1248 ? S 14:59 0:00 /bin/sh /etc/init.d/squid stop
root 13819 0.0 0.2 3692 600 pts/0 R+ 14:59 0:00 grep --color=auto squid
(ha-2) [~] date; ps axuww | grep squid
Thu Jan 3 14:59:19 CET 2008
root 13862 0.0 0.2 4628 664 ? Ss 14:59 0:00 /usr/sbin/squid -D -sYC
proxy 13865 0.6 1.7 6852 4552 ? S 14:59 0:00 (squid) -D -sYC
root 13869 0.0 0.2 3688 600 pts/0 R+ 14:59 0:00 grep --color=auto squid
(ha-2) [~] crm_mon -r -1
============
Last updated: Thu Jan 3 14:59:23 2008
Current DC: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb)
2 Nodes configured.
1 Resources configured.
============
Node: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb): online
Node: ha-1 (330da1b6-5f99-480a-b071-a144a98e1248): online
Full list of resources:
Resource Group: squid-cluster
ip0 (heartbeat::ocf:IPaddr2): Started ha-2
squid (heartbeat::ocf:squid): Started ha-2
Okay squid was running on ha-2, so lets stop it. Okay it is stopped. Now lets
see every few seconds if linux-ha detects that squid is down. And in fact it
does. When it does, it stops squid just to make sure that it is really
down. That is what the "/bin/sh /usr/lib/ocf/resource.d//heartbeat/squid stop"
line is about. A few seconds later squid is back up and running again.
One more thing I have to tell you:
In the squid.xml file you see two resources: One for the floating IP address
and one for squid. I put the two resources into a so called resource group. A
resource group does group one or more ressources together and gets sure that
they always run on the same host and that they are started and stopped in
order. So in our example the ip address is started before the squid and the
squid is stopped first and than the ip address. I defined two monitor
operations in squid.xml. Without these two operations linux-ha doesn't detect
if a service goes down. So you could for example call "ip addr del IP/CIDDR dev eth0"
and see for yourself how linux-ha configures the ip address again. So I really
do hope that this is enough to get you started. Give me feedback when you get
it running.
Thomas
-------------- next part --------------
#!/bin/sh
. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
INIT_SCRIPT=/etc/init.d/squid
case "$1" in
start)
${INIT_SCRIPT} start > /dev/null 2>&1 && exit || exit 1
;;
stop)
${INIT_SCRIPT} stop > /dev/null 2>&1 && exit || exit 1
;;
status)
${INIT_SCRIPT} status > /dev/null 2>&1 && exit || exit 1
;;
monitor)
# Check if Ressource is stopped
${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7
# Otherwise check services (XXX: Maybe loosen retry / timeout)
wget -o /dev/null -O /dev/null -T 1 -t 1 http://localhost:3128/ && exit || exit 1
;;
meta-data)
cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="squid">
<version>1.0</version>
<longdesc lang="en">
OCF Ressource Agent on top of squid init script shipped with debian.
</longdesc>
<shortdesc lang="en">OCF Ressource Agent on top of squid init script shipped with debian.</shortdesc>
<actions>
<action name="start" timeout="90" />
<action name="stop" timeout="100" />
<action name="status" timeout="60" />
<action name="monitor" depth="0" timeout="30s" interval="10s" start-delay="10s" />
<action name="meta-data" timeout="5s" />
<action name="validate-all" timeout="20s" />
</actions>
</resource-agent>
END
;;
esac
-------------- next part --------------
A non-text attachment was scrubbed...
Name: squid.xml
Type: application/xml
Size: 1391 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080103/9cbe1ace/squid.xml
More information about the Linux-HA
mailing list