[Linux-HA] Almost done with my HA setup, but somethign not working

Chun Tian (binghe) binghe.lisp at gmail.com
Wed Apr 23 11:46:02 MDT 2008


I still think, just let squid run on both servers and use HA to switch  
one IP address is OK enough for a HA beginner.

Writing a perfect OCF script is hard, and LSB script lack of monitor  
facility (if I'm correct)

--binghe

> Hi,
>
> On Tue, Apr 22, 2008 at 05:30:13PM +0000, nduda78 at comcast.net wrote:
>> How can i test for that?
>
> Skimmed this thread and saw that you wrote an OCF wrapper for the
> squid init.d (LSB) script. In the OCF wrapper there's nothing
> that justifies its existence, i.e. it just relays whatever comes
> in to /etc/init.d/squid. You should just drop the OCF wrapper and
> use the LSB script directly.
>
> Testing is done by hand. See
> http://www.linux-ha.org/ResourceAgent and
> http://www.linux-ha.org/LSBResourceAgent on how to test.
>
> Thanks,
>
> Dejan
>
>> -------------- Original message --------------
>> From: Dejan Muhamedagic <dejanmm at fastmail.fm>
>>
>>> Hi,
>>>
>>> On Tue, Apr 22, 2008 at 01:34:58PM +0000, nduda78 at comcast.net wrote:
>>>> Ok, as a better test...I stopped squid on ha-1 and quickly
>>>> modified the squid.conf file with a "Bungled" command that
>>>> would prevent squid from starting (the init.d/squid is back to
>>>> normal)
>>>>
>>>> When heartbeat checks to see if squid is running, its not. It
>>>> tries to restart squid and fails because of the error in the
>>>> config. No squid.pid is made, and no squid process is running.
>>>>
>>>> crm_mon shows squid as down on ha-1 , but after it tries to
>>>> restart it and fails crm_mon shows that it is running on ha-1,
>>>> even though it is not. Something in my config somewhere is
>>>> making Heartbeat restart squid and not seeing the process
>>>> running and thinks it is. No failover is being done.
>>>
>>> Looks like your RA is not behaving as it should. Did you check
>>> that it's managing in all these situations to return proper exit
>>> codes?
>>>
>>> Thanks,
>>>
>>> Dejan
>>>
>>>> BTW, thanks for all the replies so far, again I am new but
>>>> slowly getting it.
>>>
>>>>
>>>> -------------- Original message --------------
>>>> From: Dominik Klein
>>>>
>>>>> Nick Duda wrote:
>>>>>> I rename the restart script for squid.
>>>>>
>>>>> Your OCF Script or your /etc/init.d script?
>>>>>
>>>>>> My current setup (based on
>>>>>> examples on the web) show that if squid fails on the current  
>>>>>> runing
>>>>>> server it will try to restart itself. If restart fails it will  
>>>>>> failover.
>>>>>> So basically I am trying to make a test case scenario that if  
>>>>>> the squid
>>>>>> startup script in /etc/init.d got deleted
>>>>>
>>>>> Ah, your /etc/init.d script.
>>>>>
>>>>> Okay, look at your OCF script, what it does when /etc/init.d/ 
>>>>> squid is
>>>>> not there.
>>>>>
>>>>> -----------
>>>>> INIT_SCRIPT=/etc/init.d/squid
>>>>>
>>>>> case "$1" in
>>>>> start)
>>>>> ${INIT_SCRIPT} start > /dev/null 2>&1 && exit || exit 1
>>>>> ;;
>>>>>
>>>>> stop)
>>>>> ${INIT_SCRIPT} stop > /dev/null 2>&1 && exit || exit 1
>>>>> ;;
>>>>>
>>>>> status)
>>>>> ${INIT_SCRIPT} status > /dev/null 2>&1 && exit || exit 1
>>>>> ;;
>>>>>
>>>>> monitor)
>>>>> # Check if Ressource is stopped
>>>>> ${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7
>>>>>
>>>>> # Otherwise check services (XXX: Maybe loosen retry /
>>>>> timeout)
>>>>> wget -o /dev/null -O /dev/null -T 1 -t 1
>>>>> http://localhost:3128/ && exit || exit 1
>>>>> ;;
>>>>>
>>>>> meta-data)
>>>>> --------------
>>>>>
>>>>> So for the next monitor operation, it will exec
>>>>> "${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7"
>>>>>
>>>>> This will propably return 7. So the cluster thinks your resource  
>>>>> is
>>>>> stopped. As it was running before (I guess?), the cluster will  
>>>>> now try
>>>>> to stop and start it.
>>>>>
>>>>> Stop calls
>>>>> "stop > /dev/null 2>&1 && exit || exit 1"
>>>>>
>>>>> This will return 1. So the stop operation failed.
>>>>>
>>>>> With stonith, your node would be rebooted now. I don't see a  
>>>>> stonith
>>>>> device, so the resource goes "unmanaged".
>>>>>
>>>>> I think what you see is intended.
>>>>>
>>>>> Regards
>>>>> Dominik
>>>>>
>>>>>> and squid crashed it should
>>>>>> failover to the other box.....its not.
>>>>>>
>>>>>> Dominik Klein wrote:
>>>>>>> Nick Duda wrote:
>>>>>>>> (sorry for the long email, but all my configs are here to view)
>>>>>>>>
>>>>>>>> I posted before about HA with 2 squid servers. It's just  
>>>>>>>> about done,
>>>>>>>> but stumbling on something. Everytime i manually cause  
>>>>>>>> something to
>>>>>>>> happen in hopes to see it failover, it doesnt. For example, I  
>>>>>>>> get
>>>>>>>> crm_mon to show everything as I want it, and when I kill  
>>>>>>>> squid (and
>>>>>>>> prevent the xml from restarting it) it just goes into a failed
>>>>>>>> state...more below. Anyone see anything wrong with my configs?
>>>>>>>>
>>>>>>>> Server #1
>>>>>>>> Hostname: ha-1
>>>>>>>> eth0 - lan (192.168.95.1)
>>>>>>>> eth1 - xover to eth1 on other server
>>>>>>>>
>>>>>>>> Server #2
>>>>>>>> Hostname: ha-2
>>>>>>>> eth0 - lan (192.168.95.2)
>>>>>>>> eth1 - xover to eth1 on other server
>>>>>>>>
>>>>>>>> ha.cf on each server:
>>>>>>>>
>>>>>>>> bcast eth1
>>>>>>>> mcast eth0 239.0.0.2 694 1 0
>>>>>>>> node ha-1 ha-2
>>>>>>>> crm on
>>>>>>>>
>>>>>>>> Not using haresources because of crm
>>>>>>>>
>>>>>>>> Here is the output from crm_mon:
>>>>>>>>
>>>>>>>> ============
>>>>>>>> Last updated: Mon Apr 21 15:44:53 2008
>>>>>>>> Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d)
>>>>>>>> 2 Nodes configured.
>>>>>>>> 1 Resources configured.
>>>>>>>> ============
>>>>>>>>
>>>>>>>> Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online
>>>>>>>> Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online
>>>>>>>>
>>>>>>>> Resource Group: squid-cluster
>>>>>>>> ip0 (heartbeat::ocf:IPaddr2): Started ha-1
>>>>>>>> squid (heartbeat::ocf:squid): Started ha-1
>>>>>>>>
>>>>>>>> If squid stops on the current heartbeat serer, ha-1, it will  
>>>>>>>> restart
>>>>>>>> within 60sec...so the scripting is working. If i stop the squid
>>>>>>>> process and rename it in /etc/init.d/squid to something else,  
>>>>>>>> the
>>>>>>>> script wont be able to execute the squid start and should  
>>>>>>>> failover to
>>>>>>>> ha-2, but it doesnt, instead this appears (on both ha-1 and  
>>>>>>>> ha-2):
>>>>>>>
>>>>>>> What exactly do you "rename" and how? It's likely the cluster is
>>>>>>> behaving sane and you're just creating a testcase you don't  
>>>>>>> understand.
>>>>>>>
>>>>>>> Regards
>>>>>>> Dominik
>>>>>>>
>>>>>>>> ============
>>>>>>>> Last updated: Mon Apr 21 15:47:49 2008
>>>>>>>> Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d)
>>>>>>>> 2 Nodes configured.
>>>>>>>> 1 Resources configured.
>>>>>>>> ============
>>>>>>>>
>>>>>>>> Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online
>>>>>>>> Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online
>>>>>>>>
>>>>>>>> Resource Group: squid-cluster
>>>>>>>> ip0 (heartbeat::ocf:IPaddr2): Started ha-1
>>>>>>>> squid (heartbeat::ocf:squid): Started ha-1 (unmanaged) FAILED
>>>>>>>>
>>>>>>>> Failed actions:
>>>>>>>> squid_stop_0 (node=ha-1, call=74, rc=1): Error
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA at lists.linux-ha.org
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Linux-HA mailing list
>>>>>> Linux-HA at lists.linux-ha.org
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> IN-telegence GmbH & Co. KG
>>>>> Oskar-J?ger-Str. 125
>>>>> 50825 K?ln
>>>>>
>>>>> Registergericht K?ln - HRA 14064, USt-ID Nr. DE 194 156 373
>>>>> ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft  
>>>>> mbH,
>>>>> Registergericht K?ln - HRB 38396
>>>>> Gesch?ftsf?hrende Gesellschafter: Christian Pl?tke und Holger  
>>>>> Jansen
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA at lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA at lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list