[Linux-HA] When is a failure a failure? :CRM-Stable-4a0d4e40eeb0

Andrew Beekhof beekhof at gmail.com
Tue Nov 7 01:01:41 MST 2006


On 11/7/06, Alex and Gill Strachan <asgks at yahoo.com> wrote:
> I wasn't able to get the old logs but I generated another example

Thanks - i"ll look through this today.

>
> sinfids3b1 was the DC
> sinfids3a1 running samba
>
> Again using resource_sinfids3A_smb as the example
>
> [root at sinfids3b1 ~]# ssh s1 '/etc/init.d/smb stop'
> Shutting down SMB services: [  OK  ]
> Shutting down NMB services: [  OK  ]
>
>
> [root at sinfids3b1 ~]# crm_verify -L -VVVVV 2>&1 | grep fail-count
> crm_verify[6048]: 2006/11/07_11:00:05 debug: unpack_lrm_rsc_state: fail-count-resource_sinfids3A_smb: 1
>
> [root at sinfids3b1 ~]# /usr/lib/heartbeat/ptest -L -VVVVVVVVVVVVVVV 2>&1 | egrep assign
> ptest[6061]: 2006/11/07_11:01:00 debug: native_assign_node: Color resource_sinfids3B_vip, Node[0] sinfids3b1: 10100
> ptest[6061]: 2006/11/07_11:01:00 debug: native_assign_node: Color resource_sinfids3B_vip, Node[1] sinfids3a1: -1000000
> ptest[6061]: 2006/11/07_11:01:00 debug: native_assign_node: Color resource_sinfids3B_vip, Node[2] sinfids3a2: -1000000
> ptest[6061]: 2006/11/07_11:01:00 debug: native_assign_node: Assigning sinfids3b1 to resource_sinfids3B_vip
> ptest[6061]: 2006/11/07_11:01:00 debug: native_assign_node: Color resource_sinfids3A_smb, Node[0] sinfids3a1: 20700
> ptest[6061]: 2006/11/07_11:01:00 debug: native_assign_node: Color resource_sinfids3A_smb, Node[1] sinfids3a2: 700
> ptest[6061]: 2006/11/07_11:01:00 debug: native_assign_node: Color resource_sinfids3A_smb, Node[2] sinfids3b1: -1000000
> ptest[6061]: 2006/11/07_11:01:00 debug: native_assign_node: Assigning sinfids3a1 to resource_sinfids3A_smb
> ....  normally weight is 70700 but frailure_resource_stickiness is -50000
>
>
> [root at sinfids3b1 ~]# ssh s1 '/etc/init.d/smb stop'
> Shutting down SMB services: [  OK  ]
> Shutting down NMB services: [  OK  ]
>
>
> ****After hb restarts the samba processes...
>
>
> [root at sinfids3b1 ~]# crm_verify -L -VVVVV 2>&1 | grep fail-count
> crm_verify[6088]: 2006/11/07_11:02:08 debug: unpack_lrm_rsc_state: fail-count-resource_sinfids3A_smb: 1
>
>
> [root at sinfids3b1 ~]# /usr/lib/heartbeat/ptest -L -VVVVVVVVVVVVVVV 2>&1 | egrep assign
> ptest[6086]: 2006/11/07_11:01:50 debug: native_assign_node: Color resource_sinfids3B_vip, Node[0] sinfids3b1: 10100
> ptest[6086]: 2006/11/07_11:01:50 debug: native_assign_node: Color resource_sinfids3B_vip, Node[1] sinfids3a1: -1000000
> ptest[6086]: 2006/11/07_11:01:50 debug: native_assign_node: Color resource_sinfids3B_vip, Node[2] sinfids3a2: -1000000
> ptest[6086]: 2006/11/07_11:01:50 debug: native_assign_node: Assigning sinfids3b1 to resource_sinfids3B_vip
> ptest[6086]: 2006/11/07_11:01:50 debug: native_assign_node: Color resource_sinfids3A_smb, Node[0] sinfids3a1: 20700
> ptest[6086]: 2006/11/07_11:01:50 debug: native_assign_node: Color resource_sinfids3A_smb, Node[1] sinfids3a2: 700
> ptest[6086]: 2006/11/07_11:01:50 debug: native_assign_node: Color resource_sinfids3A_smb, Node[2] sinfids3b1: -1000000
> ptest[6086]: 2006/11/07_11:01:50 debug: native_assign_node: Assigning sinfids3a1 to resource_sinfids3A_smb
> ...
>
>
> Nov  7 11:01:16 sinfids3a1 sshd(pam_unix)[6924]: session opened for user root by (uid=0)
> Nov  7 11:01:16 sinfids3a1 smb: smbd shutdown succeeded
> Nov  7 11:01:16 sinfids3a1 nmbd[6742]: [2006/11/07 11:01:16, 0] nmbd/nmbd.c:terminate(56)
> Nov  7 11:01:16 sinfids3a1 nmbd[6742]:   Got SIGTERM: going down...
> Nov  7 11:01:16 sinfids3a1 smb: nmbd shutdown succeeded
> Nov  7 11:01:16 sinfids3a1 su(pam_unix)[6963]: session opened for user oracle by (uid=0)
> Nov  7 11:01:16 sinfids3a1 su(pam_unix)[6963]: session closed for user oracle
> Nov  7 11:01:29 sinfids3a1 crmd: [4914]: info: process_lrm_event: LRM operation (35) monitor_30000 on resource_sinfids3A_smb complete
> Nov  7 11:01:31 sinfids3a1 crmd: [4914]: info: do_lrm_rsc_op: Performing op stop on resource_sinfids3A_smb (interval=0ms, key=55:04ace2ff-e2d4-4880-9312-ba198969da3e)
> Nov  7 11:01:31 sinfids3a1 lrmd: [6990]: WARN: For LSB init script, no additional parameters are needed.
> Nov  7 11:01:31 sinfids3a1 crmd: [4914]: WARN: process_lrm_event: LRM operation (35) monitor_30000 on resource_sinfids3A_smb Cancelled
> Nov  7 11:01:31 sinfids3a1 cib: [4910]: info: cib_diff_notify: Update (client: 4914, call:69): 0.552.25943 -> 0.552.25944 (ok)
> Nov  7 11:01:31 sinfids3a1 lrmd: [4911]: info: RA output: (resource_sinfids3A_smb:stop:stdout) Shutting down SMB services:
> Nov  7 11:01:32 sinfids3a1 smb: smbd shutdown failed
> Nov  7 11:01:32 sinfids3a1 lrmd: [4911]: info: RA output: (resource_sinfids3A_sm Shutting down NMB services:
> Nov  7 11:01:32 sinfids3a1 smb: nmbd shutdown failed
> Nov  7 11:01:32 sinfids3a1 lrmd: [4911]: info: RA output: (resource_sinfids3A_smb:stop:stdout) [
> Nov  7 11:01:32 sinfids3a1 lrmd: [4911]: info: RA output: (resource_sinfids3A_smb:stop:stdout) FAILED]
> Nov  7 11:01:32 sinfids3a1 lrmd: [4911]: info: RA output: (resource_sinfids3A_smb:stop:stdout)
> Nov  7 11:01:32 sinfids3a1 crmd: [4914]: info: process_lrm_event: LRM operation (37) stop_0 on resource_sinfids3A_smb complete
> Nov  7 11:01:32 sinfids3a1 cib: [6993]: info: write_cib_contents: Wrote version 0.552.25944 of the CIB to disk (digest: bcedacb56e79f5924ffc6617116ed87c)
> Nov  7 11:01:33 sinfids3a1 crmd: [4914]: info: do_lrm_rsc_op: Performing op start on resource_sinfids3A_smb (interval=0ms, key=55:04ace2ff-e2d4-4880-9312-ba198969da3e)
> Nov  7 11:01:33 sinfids3a1 lrmd: [7014]: WARN: For LSB init script, no additional parameters are needed.
> Nov  7 11:01:33 sinfids3a1 cib: [4910]: info: cib_diff_notify: Update (client: 4914, call:70): 0.552.25944 -> 0.552.25945 (ok)
> Nov  7 11:01:33 sinfids3a1 lrmd: [4911]: info: RA output: (resource_sinfids3A_smb:start:stdout) Starting SMB services:
> Nov  7 11:01:34 sinfids3a1 smb: smbd startup succeeded
> Nov  7 11:01:34 sinfids3a1 lrmd: [4911]: info: RA output: (resource_sinfids3A_sm Starting NMB services:
> Nov  7 11:01:34 sinfids3a1 smb: nmbd startup succeeded
> Nov  7 11:01:34 sinfids3a1 lrmd: [4911]: info: RA output: (resource_sinfids3A_sm :start:stdout) [  OK  ]
> Nov  7 11:01:34 sinfids3a1 crmd: [4914]: info: process_lrm_event: LRM operation (38) start_0 on resource_sinfids3A_smb complete
> Nov  7 11:01:34 sinfids3a1 cib: [7017]: info: write_cib_contents: Wrote version 0.552.25945 of the CIB to disk (digest: b8dd205bc4ef461c8eb41b120045ecf8)
> Nov  7 11:01:35 sinfids3a1 crmd: [4914]: info: do_lrm_rsc_op: Performing op monitor on resource_sinfids3A_smb (interval=30000ms, key=55:04ace2ff-e2d4-4880-9312-ba198969da3e)
> Nov  7 11:01:35 sinfids3a1 crmd: [4914]: info: process_lrm_event: LRM operation (39) monitor_30000 on resource_sinfids3A_smb complete
> Nov  7 11:01:35 sinfids3a1 cib: [4910]: info: cib_diff_notify: Update (client: 4914, call:71): 0.552.25945 -> 0.552.25946 (ok)
> Nov  7 11:01:36 sinfids3a1 cib: [7035]: info: write_cib_contents: Wrote version 0.552.25946 of the CIB to disk (digest: 05da061feec12f5764e797350c604bb3)
> Nov  7 11:01:37 sinfids3a1 cib: [4910]: info: cib_diff_notify: Update (client: 4914, call:72): 0.552.25946 -> 0.552.25947 (ok)
> Nov  7 11:01:37 sinfids3a1 cib: [7036]: info: write_cib_contents: Wrote version 0.552.25947 of the CIB to disk (digest: edbaed654889ad8a6c8eec512cc47424)
>
>
>
>
>
>
> ----- Original Message ----
> From: Andrew Beekhof <beekhof at gmail.com>
> To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
> Sent: Tuesday, 7 November, 2006 12:49:50 AM
> Subject: Re: Re: [Linux-HA] When is a failure a failure? :CRM-Stable-4a0d4e40eeb0
>
> On 11/6/06, Alex and Gill Strachan <asgks at yahoo.com> wrote:
> > Which logs...
> >
> > Is /var/logs/messages enough ?   (this is where my hb syslogs to)
>
> yep, just make its from the DC
>
> >
> > ----- Original Message ----
> > From: Andrew Beekhof <beekhof at gmail.com>
> > To: General Linux-HA mailing list <linux-ha at lists.linux-ha.org>
> > Sent: Monday, 6 November, 2006 11:45:49 PM
> > Subject: Re: [Linux-HA] When is a failure a failure? :CRM-Stable-4a0d4e40eeb0
> >
> > On 11/5/06, Alex and Gill Strachan <asgks at yahoo.com> wrote:
> > > I have a resource resource_sinfids3A_smb.
> > >
> > > If if stop this manually while the cluster is managing the resource it detects this then restarts samba; adjusting the node weight by removing the failure_stickiness and setting fail-count to 1.  All good.
> > >
> > > Next I stop it manually again, hb restarts, stop again, hb restarts ....
> > >
> > > The failcount remains at 1.  Why is this?
> >
> > you repeated the same procedure as the first time and it didnt increment?
> > can you send me the logs from this?
> >
> > > What dicates that the failcount increments ?
> > >
> > >
> > >
> > > Send instant messages to your online friends http://au.messenger.yahoo.com
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
> >
> >
> >
> >
> > Send instant messages to your online friends http://au.messenger.yahoo.com
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>
>
>
>
> Send instant messages to your online friends http://au.messenger.yahoo.com
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>


More information about the Linux-HA mailing list