[Linux-HA] strange monitor behaviour
Pavol Gono
palo.gono at gmail.com
Wed Jan 10 09:06:55 MST 2007
On 1/10/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> On 1/9/07, Pavol Gono <palo.gono at gmail.com> wrote:
> > On 1/9/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > > > A)
> > > > It would be nice to have some list of necessary software installed
> > > > when one wants to run it. E.g. on SLES10 you need python-xml package.
> > > > On debian (debo machine), installing python-dev or python-xml
> > > > decreased number of 'BadNews' from 26 to 2. Maybe python version is
> > > > also important...
> > >
> > > can you send me both outputs? that shouldn't be the case.
> >
> > I looked at logs, it seems so.
>
> i think you just got a little luckier the second time
> i dont believe installing python-xml will have made any difference
I didn't choose the best logs, on debian we maybe don't need
python-xml. But on SLES10, installing python-xml RPM was necessary -
see attachment. This was end of log:
...
Starting CRM tests
CRM tests failed (rc=1).
In TestCRM function there could be some hint about python-xml, when running
$CRMTEST -L $LOGFILE >/dev/null 2>&1
is unsuccessful.
> > > > G)
> > > > Many times I experienced these messages (output of BasicSanityCheck):
> > > >
> > > > ...
> > > > Reloading heartbeat
> > > > Reloading heartbeat
> > > > Stopping heartbeat
> > > > Stopping High-Availability services:
> > > > Done.
> > > >
> > > > Looks like heartbeat did not really stop.
> > > > You\'ll probably need to kill some processes yourself.
> > > > Checking STONITH basic sanity.
> > > > ...
> > > >
> > > > What does it mean - Can't heartbeat stop itself?
> > >
> > > possible - but without the logs its impossible to say why
> > >
> >
> > All four attached log files have such message
>
> strange i dont see anything like that in the attachment
see e.g. previous attachment, B/pgnotas_two_tries/linux-ha.testlog1, line 1605.
But this may have something to do with shutdown problem on debian (all
log files were from debian), so let see it after repairing the thing
written in bugzilla.
But let's go to the original topic :)
I installed heartbeat from sources, changeset 9934, configure options
are custom like in previous posts. Distribution SLES10, nodes
deboserver and pgbook. BasicSanityCheck was successful on both.
I made very similar configuration like in the first post, resources
IPaddr and Dummy.
When I removed directory /tmp/a on machine, where resources were
running, the same situation occured: Dummy resource is stopped, IPaddr
resource remains on original node, no failover.
Is this correct behaviour?
I tried more combinations of stickiness and failure stickiness (INF
-INF, 0 -1, 10 -1), but with the same result.
In attachment there is snapshot of the suspicious situation.
And a little silly question - how can I read current node score?
cl_status nodeweight <nodename>
gives me always value 100 in described configuration.
Palo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strange_monitor_behaviour4.tar.bz2
Type: application/x-bzip2
Size: 52949 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20070110/414b3481/strange_monitor_behaviour4.tar-0001.bin
More information about the Linux-HA
mailing list