[Linux-HA] Linux-HA Service Monitoring

Thomas Glanzmann thomas at glanzmann.de
Thu Jan 3 07:09:05 MST 2008


Hello again,
so here is step by step guide. I just did it to stay in training. I used
Debian etch:

        On both nodes:
                /etc/init.d/heartbeat stop
                rm /var/lib/heartbeat/crm/* (Don't do this in production it
                                             wipes your configuration)

                (ha-1) [~] cat /etc/ha.d/ha.cf
                use_logd yes
                bcast eth1
                mcast eth0 239.0.0.2 694 1 0
                node ha-1 ha-2
                crm on

                Note: eth0 is the network segment both nodes are on; eth1 is a crosslink

                Copy the attached OCF Agent for squid on top of debian init
                script to /usr/lib/ocf/resource.d/heartbeat/squid

                I wrote that ressource agent because I could not find one for
                squid.

                chmod +x /usr/lib/ocf/resource.d/heartbeat/squid

                apt-get install squid

                Get sure squid isn't started by init:

                        update-rc.d -f squid remove (This is for Debian)
                        chkconfig squid off (That would be for SuSE / Red Hat)

                /etc/init.d/heartbeat stop

        On _one_ node load the attached XML Service Description:

                cibadmin -U -x squid.xml (Don't forget to adopt your IP address)

        Call "crm_mon -r -1" to see if squid gets started and on which node it
        gets started:

                crm_mon -r -1

        On that node call:

                /etc/init.d/squid stop

        The monitor interval is set to 60 seconds. So linux-ha should detect
        within 60 seconds that squid is gone and restart it. Linux-ha tries on
        default first the node where it ran before. If that goes wrong for
        whatever reason it tries the other node.

And now to the practice:

(ha-1) [~] crm_mon -1 -r
============
Last updated: Thu Jan  3 14:57:57 2008
Current DC: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb)
2 Nodes configured.
0 Resources configured.
============

Node: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb): online
Node: ha-1 (330da1b6-5f99-480a-b071-a144a98e1248): online

Full list of resources:

(ha-1) [~] cibadmin -U -x squid.xml
(ha-1) [~] crm_mon -1 -r
============
Last updated: Thu Jan  3 14:58:17 2008
Current DC: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb)
2 Nodes configured.
1 Resources configured.
============

Node: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb): online
Node: ha-1 (330da1b6-5f99-480a-b071-a144a98e1248): online

Full list of resources:

Resource Group: squid-cluster
    ip0 (heartbeat::ocf:IPaddr2):       Started ha-2
    squid       (heartbeat::ocf:squid): Started ha-2

Okay we loaded the config and squid is running on ha-2 so lets go to ha-2:

(ha-2) [~] date; ps axuww | grep squid
Thu Jan  3 14:58:38 CET 2008
root     13732  0.0  0.2   4628   672 ?        Ss   14:58   0:00 /usr/sbin/squid -D -sYC
proxy    13734  0.0  1.7   6848  4572 ?        S    14:58   0:00 (squid) -D -sYC
root     13753  0.0  0.2   3688   596 pts/0    R+   14:58   0:00 grep --color=auto squid
(ha-2) [~] /etc/init.d/squid stop
Stopping Squid HTTP proxy: squid.
(ha-2) [~] date; ps axuww | grep squid
Thu Jan  3 14:58:48 CET 2008
root     13762  0.0  0.2   3692   600 pts/0    R+   14:58   0:00 grep --color=auto squid
(ha-2) [~] date; ps axuww | grep squid
Thu Jan  3 14:59:04 CET 2008
root     13767  0.0  0.2   3688   596 pts/0    R+   14:59   0:00 grep --color=auto squid
(ha-2) [~] date; ps axuww | grep squid
Thu Jan  3 14:59:10 CET 2008
root     13770  0.0  0.2   3688   596 pts/0    R+   14:59   0:00 grep --color=auto squid
(ha-2) [~] date; ps axuww | grep squid
Thu Jan  3 14:59:15 CET 2008
root     13806  0.0  0.5   4648  1396 ?        S    14:59   0:00 /bin/sh /usr/lib/ocf/resource.d//heartbeat/squid stop
root     13812  0.0  0.4   2536  1248 ?        S    14:59   0:00 /bin/sh /etc/init.d/squid stop
root     13819  0.0  0.2   3692   600 pts/0    R+   14:59   0:00 grep --color=auto squid
(ha-2) [~] date; ps axuww | grep squid
Thu Jan  3 14:59:19 CET 2008
root     13862  0.0  0.2   4628   664 ?        Ss   14:59   0:00 /usr/sbin/squid -D -sYC
proxy    13865  0.6  1.7   6852  4552 ?        S    14:59   0:00 (squid) -D -sYC
root     13869  0.0  0.2   3688   600 pts/0    R+   14:59   0:00 grep --color=auto squid
(ha-2) [~] crm_mon -r -1

============
Last updated: Thu Jan  3 14:59:23 2008
Current DC: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb)
2 Nodes configured.
1 Resources configured.
============

Node: ha-2 (095256ab-361c-4b1e-9a8b-8bed74c4a7fb): online
Node: ha-1 (330da1b6-5f99-480a-b071-a144a98e1248): online

Full list of resources:

Resource Group: squid-cluster
    ip0 (heartbeat::ocf:IPaddr2):       Started ha-2
    squid       (heartbeat::ocf:squid): Started ha-2

Okay squid was running on ha-2, so lets stop it. Okay it is stopped. Now lets
see every few seconds if linux-ha detects that squid is down. And in fact it
does. When it does, it stops squid just to make sure that it is really
down. That is what the "/bin/sh /usr/lib/ocf/resource.d//heartbeat/squid stop"
line is about. A few seconds later squid is back up and running again.

One more thing I have to tell you:

In the squid.xml file you see two resources: One for the floating IP address
and one for squid. I put the two resources into a so called resource group. A
resource group does group one or more ressources together and gets sure that
they always run on the same host and that they are started and stopped in
order. So in our example the ip address is started before the squid and the
squid is stopped first and than the ip address. I defined two monitor
operations in squid.xml. Without these two operations linux-ha doesn't detect
if a service goes down. So you could for example call "ip addr del IP/CIDDR dev eth0"
and see for yourself how linux-ha configures the ip address again. So I really
do hope that this is enough to get you started. Give me feedback when you get
it running.

                Thomas
-------------- next part --------------
#!/bin/sh

. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs

INIT_SCRIPT=/etc/init.d/squid

case  "$1" in
        start)
                ${INIT_SCRIPT} start > /dev/null 2>&1 && exit || exit 1
        ;;

        stop)
                ${INIT_SCRIPT} stop > /dev/null 2>&1 && exit || exit 1
        ;;

        status)
                ${INIT_SCRIPT} status > /dev/null 2>&1 && exit || exit 1
        ;;

        monitor)
                # Check if Ressource is stopped
                ${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7

                # Otherwise check services (XXX: Maybe loosen retry / timeout)
                wget -o /dev/null -O /dev/null -T 1 -t 1 http://localhost:3128/ && exit || exit 1
        ;;

        meta-data)
                cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="squid">
<version>1.0</version>

<longdesc lang="en">
OCF Ressource Agent on top of squid init script shipped with debian.
</longdesc>

<shortdesc lang="en">OCF Ressource Agent on top of squid init script shipped with debian.</shortdesc>

<actions>
<action name="start"   timeout="90" />
<action name="stop"    timeout="100" />
<action name="status" timeout="60" />
<action name="monitor" depth="0" timeout="30s" interval="10s" start-delay="10s" />
<action name="meta-data"  timeout="5s" />
<action name="validate-all"  timeout="20s" />
</actions>
</resource-agent>
END
        ;;
esac
-------------- next part --------------
A non-text attachment was scrubbed...
Name: squid.xml
Type: application/xml
Size: 1391 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20080103/9cbe1ace/squid.xml


More information about the Linux-HA mailing list