[Linux-HA] New problem(s) with heartbeat 2.0.3 and STONITH

Andrew Beekhof beekhof at gmail.com
Fri Oct 28 08:18:20 MDT 2005


On 10/28/05, Alan Robertson <alanr at unix.sh> wrote:
> Sun Jiang Dong wrote:
> >
> >
> > Alan Robertson wrote:
> >> Stefan Peinkofer wrote:
> >>
> >>> Hello everybody,
> >>>
> >>> unforunately I have new prolbems with the heartbeat 2.0.3 cvs version
> >>> and stonith.
> >>>
> >>> I ran a cvs heartbeat which was checked out on 2005-10-18 and
> >>> encountered a problem with stonithd which was killed by signal 11.
> >>> The effects were that the stonith resources were NOT_ACTIVE and when I
> >>> initiated a split brain no node could fence the other off.
> >>>
> >>> I thought maybe it's already fixed in cvs and checkout a version today
> >>> (2005-10-26). But unfortunately this version seems to contain a even
> >>> worse problem with stonith.
> >>>
> >>> After I startup heartbeat on the two nodes, and wait until it's started
> >>> up completely I initiated the split brain situation. I had expected that
> >>> this works as expected because both stonith resources were active.
> >>>
> >>> In the logs I saw:
> >>> Oct 26 17:30:53 spock pengine: [20031]: WARN: mask(stages.c:stage6):
> >>> Scheduling Node sarek for STONITH
> >>> Thats what I want :)
> >>> But then the following message appeared:
> >>> Oct 26 17:31:03 spock tengine: [20030]: ERROR: stonithd_node_fence:
> >>> cannot add field to ha_msg.
> >>
> >>
> >> This is some kind of an issue in the lib/fencing/stonithd_lib.c file
> >>
> >>         if (  (ha_msg_add_int(request, F_STONITHD_OPTYPE, op->optype)
> >> != HA_OK )
> >>             ||(ha_msg_add(request, F_STONITHD_NODE, op->node_name ) !=
> >> HA_OK)
> >>             ||(op->node_uuid == NULL
> >>                || ha_msg_add(request, F_STONITHD_NODE_UUID,
> >> op->node_uuid) != HA_OK)
> >>             ||(op->private_data == NULL
> >>                || ha_msg_add(request, F_STONITHD_PDATA,
> >> op->private_data) != HA_OK)
> >>             ||(ha_msg_add_int(request, F_STONITHD_TIMEOUT, op->timeout)
> >>                 != HA_OK) ) {
> >>                 stdlib_log(LOG_ERR, "stonithd_node_fence: "
> >>                            "cannot add field to ha_msg.");
> >>                 ZAPMSG(request);
> >>                 return ST_FAIL;
> >>         }
> >>
> >> My guess is that op->node_name or op->optype is NULL.  The code should
> >> have validated those.  Since they're critical, and they come from
> >> who-knows-where (meaning some doofus user process), they should
> >> definitely have been error checked, and there should be a clear
> >> message about their errors.
> >>
> >
> > Should be op->private_data == NULL. This condition is not reasonable.
> > I'll fix it.
> >
> >> Things I don't quite understand...
> >> UUIDs are normally special portable binary values with their own type
> >> in the structure world...  Having this be a string violates the law of
> >> least surprise.  If they're not really uuids, then they shouldn't be
> >> CALLED uuids.
> > There is a long story regarding this, it's required by Andrew.
>
>
> If Andrew requires you to call something which isn't a UUID as a uuid,
> then he screwed up and he should fix it.

delightfully tactful as ever.

from reading this one would think that its the first time time we've
had this discussion.

>
> A UUID is not simply a random identifier which is forced to be unique
> (like he requires his id= in XML), it's an industry standard term as per
>   DCE 1.1, ISO/IEC 11578:1996 and RFC 4122.
>
> So, it is not some string guaranteed to be unique.  In fact, it isn't a
> string at all, but a 128-bit binary value.  There are specified ways of
> printing UUIDs, but they're not precisely UUIDs, but ASCII
> representations of UUIDs.
>
> So, if it's not a 128-bit binary value in compliance with DCE 1.2,
> ISO/IEC 11578:1996 or RFC 4122, it's not really a UUID.
>         http://www.faqs.org/rfcs/rfc4122.html
>
> [This URL even contains a sample UUID implementation]
>
> --
>      Alan Robertson <alanr at unix.sh>
>
> "Openness is the foundation and preservative of friendship...  Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
>



More information about the Linux-HA mailing list