[Linux-HA] strange monitor behaviour

Pavol Gono palo.gono at gmail.com
Tue Jan 9 03:23:48 MST 2007


For the case D) my colleague found reason of buggy behaviour. The
guilty one is grep!
See the attachment and try these commands:
grep '[Aa][Rr][Pp]' log.txt
grep -i '[Aa][Rr][Pp]' log.txt
On Suse 9.3, SLES10 and an opensuse, the grep behaviour was different
(in second case, the output was only one line).

The affected code in BasicSanityCheck:
line 515: LookForString "[Aa][Rr][Pp]" >/dev/null
line 356: grep -i "$1" $LOGFILE
maybe also line 372: count=`egrep -ic "$1" $LOGFILE`

So the workaround will be e.g. defining LookForString2, where we use
grep without -i switch.


For case B) I wrongly wrote debian sarge - the correct distribution is
debian etch (testing).

Palo


On 1/8/07, Pavol Gono <palo.gono at gmail.com> wrote:
> B)
> On my notebook I use debian sarge, python version 2.4. When using HB
> sources directly (changeset 9918) and configure options equal to debo
> machine, BasicSanityCheck made a strange exception. Snippet from
> linux-ha.testlog:
> ... CTS: Warn: Startup pattern not found: crmd.*pgnotas: State
> transition.*-> S_IDLE
> ... CTS: Node pgnotas status:
> ... CTS: Node status for pgnotas is down but we think it should be up
> ... CTS: Warn: Start failed for node pgnotas
> ... CTS: Tearing down partial setup
> ... CTS: Stopping Cluster Manager on BSC node(s).
> ... CTS: Exception by exceptions.TypeError
> ... CTS: Traceback (most recent call last):
> ... CTS:   File "/usr/local/lib/heartbeat/cts/CTSlab.py", line 791, in ?
> ... CTS:     overall, detailed = tests.run(NumIter)
> ... CTS: TypeError: unpack non-sequence
> ... CTS: ****************
> ... CTS: Overall Results:{'failure': 0, 'success': 0, 'BadNews': 0}
> ... CTS: ****************
> ... CTS: Detailed Results
> ... CTS: Test AddResource:  {'auditfail': 0, 'failure': 0, 'skipped':
> 0, 'success': 0, 'calls': 0}
> ... CTS: <<<<<<<<<<<<<<<< TESTS COMPLETED
> ... CTS: No failure count but success != requested iterations
> CRM tests failed (rc=1).
> (end of linux-ha.testlog now)
>
>
> D)
> On one SLES10 machine my colleague used HB sources of changeset 9909.
> Configure options were similar to debo & fico machines.
> There is one error reported at the end. It is triggered when the 'Does
> not look like we ARPed the address' messages is displayed. At the very
> beginning there is also message 'RTNETLINK answers: Network is
> unreachable' which I do not know where it comes from.
> Snippets from output of BasicSanityCheck:
> RTNETLINK answers: Network is unreachable
> Using interface: eth3
> Starting base64 and md5 algorithm tests
> base64 and md5 algorithm tests succeeded.
> Starting heartbeat
> Starting High-Availability services:
> 2007/01/08_14:56:02 INFO:  Resource is stopped
>    done
>
> Does not look like we ARPed the address
> Reloading heartbeat
> Reloading heartbeat
> Stopping heartbeat
> ...
> Starting CRM tests
> CRM tests passed.
> 1 errors. Log file is stored in /tmp/linux-ha.testlog
-------------- next part --------------
IPaddr[4805]:	2007/01/09_10:26:49 INFO: Sending Gratuitous Arp for 10.0.1.50 on eth1:0 [eth1]
IPaddr[4805]:	2007/01/09_10:26:49 INFO: /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.0.1.50 eth1 10.0.1.50 auto 10.0.1.50 ffffffffffff


More information about the Linux-HA mailing list