[Linux-HA] moving from groups to all resources for better control: CRM-Stable-4a0d4e40eeb0:

Andrew Beekhof beekhof at gmail.com
Sat Nov 4 02:44:14 MST 2006


On 11/4/06, Alex and Gill Strachan <asgks at yahoo.com> wrote:
> >> On 11/2/06, Alex and Gill Strachan <asgks at yahoo.com> wrote:
> >> > Mmmmhhhh I think I have lost my way !
> >> >
> >> > Moving from groups to all resources for better control.  Unable to link
> >> >
> >> > ptest[20348]: 2006/11/02_22:31:24 debug: native_assign_node: All nodes
> >> > for color resource_sinfids3A_fs are unavailable, unclean or shutting
> >> > down
> >> >
> >> > to the drbd resource.  How can this be achieved?  I have cleared any failure-counts.
> >>
> >> when you run ptest, can you add "-G saved-cib.xml" to the command line
>
> >sorry, that should be -I not -G
>
> Thankyou for the offer to analysis this further.  I eventually cleared the error by
>   stopping hb on all nodes
>   rm /var/lib/heartbeat/crm/*
>   rm /var/lib/heartbeat/pengine/*

this is unnecessary except in terms of disk-space

>   then restart hb, and finally putting back the original cib.xml
>
> This is not a practise which I am confortable with !!

no doubt.  its quite ugly.
luckily we have crm_resource -C

>
> There were a number of the following errors which were stopping the cluster from doing sanything...
>
>   Nov  3 10:54:22 sinfids3a2 crm_verify: [9191]: WARN: unpack_rsc_op: Processing failed op
>   (resource_sinfids3A_aims_monitor_180000) for resource_sinfids3A_aims on sinfids3a1
>   ... plus other resources
>
> How can I remove these in the future without such drastic steps :-)

as above, with crm_resource -C (use -? to have it show you the other
parameters you'll need)

> Later on I had this error
>     Nov  3 11:04:09 sinfids3a2 crmd: [4123]: info: cancel_monitor: Couldn't cancel   resource_sinfids3A_smb_monitor_900000 (73)

thats b-a-d.  can you send me the complete logs please?

>   Nov  3 11:04:09 sinfids3a2 crmd: [4123]: info: send_direct_ack: ACK'ing resource op: monitor for resource_sinfids3A_smb
>
> This was generated after I manually changed the monitoring interval of the smb resource.
> This was done with the cp the active cib.xml file, then edit the copy and use cibadmin -R to activate the copy.

overkill, in that you can replace individual objects like resources in
the CIB, but we still shouldn't behave like that as a result

>
> Maybe using the cibadmin -Q advice you gave in an other response might not have done this, in the future I'll use the cibadmin -Q ... method.
>
> The fix I applied for the above was again to stop hb on all nodes...
>
> I would like to say thanks to you Andrew for your involement in this mailing list.

glad to help :-)

> >> and send me the "saved-cib.xml" file... then i'll be able to say for
> >> sure what's going on.
> >>
> >> >
> >> >
> >> > Is there some shorthand tips I should use?
> >> >
> >> > Summary:
> >> > Node: sinfids3A
> >> > [[[how to get resource_sinfids3A_fs]]]]
> >> > resource_sinfids3A_drbd
> >> > resource_sinfids3A_vip
> >> >
> >> > resource_sinfids3_vip
> >> >
> >> >
> >> > [root at sinfids3a2 ~]# /usr/lib/heartbeat/ptest -L -VVVVVVVVVVVVVVV 2>&1 | egrep assign
> >> > ptest[20348]: 2006/11/02_22:31:24 debug: native_assign_node: All nodes for color r>esource_sinfids3B_vip are unavailable, unclean or shutting down
>
>
>
>
>
>
> Send instant messages to your online friends http://au.messenger.yahoo.com
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list