[Linux-HA] Almost done with my HA setup, but somethign not working

Nick Duda nduda78 at comcast.net
Mon Apr 21 18:32:35 MDT 2008


(sorry for the long email, but all my configs are here to view)

I posted before about HA with 2 squid servers. It's just about done, but 
stumbling on something. Everytime i manually cause something to happen 
in hopes to see it failover, it doesnt. For example, I get crm_mon to 
show everything as I want it, and when I kill squid (and prevent the xml 
from restarting it) it just goes into a failed state...more below. 
Anyone see anything wrong with my configs?

Server #1
Hostname: ha-1
eth0 - lan (192.168.95.1)
eth1 - xover to eth1 on other server

Server #2
Hostname: ha-2
eth0 - lan (192.168.95.2)
eth1 - xover to eth1 on other server

ha.cf on each server:

bcast eth1
mcast eth0 239.0.0.2 694 1 0
node ha-1 ha-2
crm on

Not using haresources because of crm

Here is the output from crm_mon:

============
Last updated: Mon Apr 21 15:44:53 2008
Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d)
2 Nodes configured.
1 Resources configured.
============

Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online
Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online

Resource Group: squid-cluster
    ip0 (heartbeat::ocf:IPaddr2):       Started ha-1
    squid       (heartbeat::ocf:squid): Started ha-1

If squid stops on the current heartbeat serer, ha-1, it will restart 
within 60sec...so the scripting is working. If i stop the squid process 
and rename it in /etc/init.d/squid to something else, the script wont be 
able to execute the squid start and should failover to ha-2, but it 
doesnt, instead this appears (on both ha-1 and ha-2):

============
Last updated: Mon Apr 21 15:47:49 2008
Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d)
2 Nodes configured.
1 Resources configured.
============

Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online
Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online

Resource Group: squid-cluster
    ip0 (heartbeat::ocf:IPaddr2):       Started ha-1
    squid       (heartbeat::ocf:squid): Started ha-1 (unmanaged) FAILED

Failed actions:
    squid_stop_0 (node=ha-1, call=74, rc=1): Error



------------ /etc/init.d/squid ------------

#!/bin/bash
# squid         This shell script takes care of starting and stopping
#               Squid Internet Object Cache
#
# chkconfig: - 90 25
# description: Squid - Internet Object Cache. Internet object caching is \
#       a way to store requested Internet objects (i.e., data available \
#       via the HTTP, FTP, and gopher protocols) on a system closer to the \
#       requesting site than to the source. Web browsers can then use the \
#       local Squid cache as a proxy HTTP server, reducing access time as \
#       well as bandwidth consumption.
# pidfile: /var/run/squid.pid
# config: /etc/squid/squid.conf

PATH=/usr/bin:/sbin:/bin:/usr/sbin
export PATH

# Source function library.
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0

# check if the squid conf file is present
[ -f /usr/local/squid/etc/squid.conf ] || exit 0

# determine the name of the squid binary
[ -f /usr/sbin/squid ] && SQUID=squid
[ -z "$SQUID" ] && exit 0

# determine which one is the cache_swap directory
CACHE_SWAP=`sed -e 's/#.*//g' /usr/local/squid/etc/squid.conf | \
grep cache_dir | sed -e 's/cache_dir//' | \
cut -d ' ' -f 2`
[ -z "$CACHE_SWAP" ] && CACHE_SWAP=/cache

# default squid options
# -D disables initial dns checks. If you most likely will not to have an
#    internet connection when you start squid, uncomment this
#SQUID_OPTS="-D"

RETVAL=0
case "$1" in
start)
echo -n "Starting $SQUID: "
for adir in $CACHE_SWAP; do
if [ ! -d $adir/00 ]; then
echo -n "init_cache_dir $adir... "
$SQUID -z -F 2>/dev/null
fi
done
$SQUID $SQUID_OPTS &
RETVAL=$?
echo $SQUID
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/$SQUID
;;

stop)
echo -n "Stopping $SQUID: "
$SQUID -k shutdown &
RETVAL=$?
if [ $RETVAL -eq 0 ] ; then
rm -f /var/lock/subsys/$SQUID
while : ; do
[ -f /var/run/squid.pid ] || break
sleep 2 && echo -n "."
done
echo "done"
else
echo
fi
;;

reload)
$SQUID $SQUID_OPTS -k reconfigure
exit $?
;;

restart)
$0 stop
$0 start
;;

status)
status $SQUID
$SQUID -k check
exit $?
;;

probe)
exit 0;
;;

*)
echo "Usage: $0 {start|stop|status|reload|restart}"
exit 1
esac

exit $RETVAL

------------ End /etc/init.d/squid ------------

------------ squid.xml (i import with cibadmin -U -x) ------------

<cib>
<configuration>
   <crm_config>
    <nodes/>
   <resources>
                <group id="squid-cluster">
                        <primitive class="ocf" provider="heartbeat" 
type="IPaddr2" id="ip0">
                                <instance_attributes id="ia-ip0">
                                        <attributes>
                                                <nvpair id="ia-ip0-1" 
name="ip" value="192.168.95.5"/>
                                                <nvpair id="ia-ip0-2" 
name="cidr_netmask" value="24"/>
                                                <nvpair id="ia-ip0-3" 
name="nic" value="eth0"/>
                                        </attributes>
                                </instance_attributes>
                                <operations>
                                        <op id="ip0-monitor0" 
name="monitor" interval="60s" timeout="120s" start_delay="1m"/>
                                </operations>
                        </primitive>

                        <primitive class="ocf" provider="heartbeat" 
type="squid" id="squid">
                                <operations>
                                        <op name="monitor" 
interval="60s" timeout="120s" start_delay="1m" id="monitor-squid"/>
                                </operations>
                        </primitive>
                </group>
        </resources>

        <constraints/>
</configuration>
<status/>
</cib>

------------ End squid.xml ------------

------------  cib.xml ------------

 <cib generated="false" admin_epoch="0" have_quorum="true" 
ignore_dtd="false" num_peers="0" cib_feature_revision="2.0" epoch="3" 
num_updates="2" cib-last-written="Mon Apr 21 11:52:24 2008" 
ccm_transition="1">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-dc-version" 
name="dc-version" value="2.1.3-node: 
552305612591183b1628baa5bc6e903e0f1e26a3"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="2422b230-22f2-451b-aa95-0b783eccab8d" uname="ha-1" 
type="normal"/>
       <node id="1691d699-2a81-4545-8242-b00862431514" uname="ha-2" 
type="normal"/>
     </nodes>
     <resources>
       <group id="squid-cluster">
         <primitive class="ocf" provider="heartbeat" type="IPaddr2" 
id="ip0">
           <instance_attributes id="ia-ip0">
             <attributes>
               <nvpair id="ia-ip0-1" name="ip" value="192.168.95.5"/>
               <nvpair id="ia-ip0-2" name="cidr_netmask" value="24"/>
               <nvpair id="ia-ip0-3" name="nic" value="eth0"/>
             </attributes>
           </instance_attributes>
           <operations>
             <op id="ip0-monitor0" name="monitor" interval="60s" 
timeout="120s" start_delay="1m"/>
           </operations>
         </primitive>
         <primitive class="ocf" provider="heartbeat" type="squid" 
id="squid">
           <operations>
             <op name="monitor" interval="60s" timeout="120s" 
start_delay="1m" id="monitor-squid"/>
           </operations>
         </primitive>
       </group>
     </resources>
     <constraints/>
   </configuration>
 </cib>

------------ End cib.xml ------------

------------  squid ocf file ------------

#!/bin/sh

#. $HA_HBCONF_DIR/shellfuncs
. /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs

INIT_SCRIPT=/etc/init.d/squid

case  "$1" in
        start)
                ${INIT_SCRIPT} start > /dev/null 2>&1 && exit || exit 1
        ;;

        stop)
                ${INIT_SCRIPT} stop > /dev/null 2>&1 && exit || exit 1
        ;;

        status)
                ${INIT_SCRIPT} status > /dev/null 2>&1 && exit || exit 1
        ;;

        monitor)
                # Check if Ressource is stopped
                ${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7

                # Otherwise check services (XXX: Maybe loosen retry / 
timeout)
                wget -o /dev/null -O /dev/null -T 1 -t 1 
http://localhost:3128/ && exit || exit 1
        ;;

        meta-data)
                cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="squid">
<version>1.0</version>

<longdesc lang="en">
OCF Ressource Agent on top of squid init script shipped with debian.
</longdesc>

<shortdesc lang="en">OCF Ressource Agent on top of squid init script 
shipped with debian.</shortdesc>

<actions>
<action name="start"   timeout="90" />
<action name="stop"    timeout="100" />
<action name="status" timeout="60" />
<action name="monitor" depth="0" timeout="30s" interval="10s" 
start-delay="10s" />
<action name="meta-data"  timeout="5s" />
<action name="validate-all"  timeout="20s" />
</actions>
</resource-agent>
END

------------  End squid ocf file ------------





More information about the Linux-HA mailing list