[Linux-HA] mgmtd and SLES9

Jose Jerez tale.toul at gmail.com
Sun Sep 9 16:38:26 MDT 2007


Hello again,

I found the problem and a possible solution, although not the best
one. I'll explain myselft

On 9/7/07, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> Hi,
>
> On Fri, Sep 07, 2007 at 10:10:42AM +0200, Jose Jerez wrote:
> > Hello,
> >
> > I'm trying to compile heartbeat 2.1.2 in SLES9 SP3 on Power, but I got
> > an annoying problems with mgmtd/hb_gui
> >
> > When connecting to heartbeat using hb_gui the following error pops up:
> > "Failed in the authentication. User Name or Password may be wrong or
> > the user dons't belong the the haclient group"
> >
> > The password is right, I can connect using ssh, and the user hacluster
> > belongs to the haclient group
> >
> > # id hacluster
> > uid=90(hacluster) gid=90(haclient)
> > groups=90(haclient),14(uucp),16(dialout),17(audio),33(video)
> >
> > This error message appears in the log file:
> >
> > mgmtd[5656]: 2007/09/04_09:57:37 ERROR: on_listen receive login msg failed
> >
> > No other error messages appear in the logs
> >
> > Heartbeat seems to start fine, and the CIB is empty, no resources or
> > nodes configured yet.
> >
> > The system is SLES9 SP3 on Power (ppc64) and the Heartbeat version is
> > 2.1.2 compiled from source.
> >
> > I compiled the same source on a SLES10SP1 on i586 and didn't find the
> > problem, at least not after changing the /etc/pam.d/hbmgmtd to use pam_unix2.so
> >
> > So my guess here is that some libraries in SLES9 are too old for
> > mgmtd.  Am I right?
>
> No, I don't think so.
>
I wouldn't be so sure.

> > Is it actually possible to have this combination of SLES9 and
> > heartbeat 2.1.2 working?
>
> Yes, it should work.
>
> > I'll keep on trying for a solution to this, otherwise I might have to
> > use HACMP and I don't like that a bit.
>
> HACMP? Really?

Yeah I'm afraid so, not that I like it but these guys came over and
convinced the bosses it was a good solution for our system (cluster
with SLES9 SP3 on Power + oracle 10), and not only HACMP but also
GPFS.  So now is my turn to prove that heartbeat is a better solution,
and the fact that we have been using it for more than 3 years doesn't
seem enough of a proof.
>
> > Any hints to solve this will be appreciate
>
> This is most probably a pam problem, but most probably a not very
> obvious one. Since your ssh connection works, could you try
> replicating the pam configuration for ssh to hbmgmtd. Just the
> part which makes sense, of course (auth and account).
>
> Otherwise, you could try debuging pam. I suppose that there's a
> way to do that on SLES9.
>
No, it's not a pam problem, I tried the pam_permit.so module which is
like an open door (any password is accepted) and the problem didn't go
away.

> Interestingly, on my opensuse 10.2 pam_unix.so works fine.
>
> Thanks,
>
> Dejan
>
So, I dig into the code and find that the error message comes from
mgmtd.c and the function on_listen(...), the block of code starts
with:

if (msg == NULL || num != 4 || STRNCMP_CONST(args[0], MSG_LOGIN) != 0) {

I add the following

 mgmt_log(LOG_ERR, "%s number of fields in message %d", __FUNCTION__,num);
 mgmt_log(LOG_ERR, "%s the message itself: %s", __FUNCTION__,msg);

Now when I try to connect this is what shows up in the error log:

...
mgmtd[26192]: 2007/09/09_22:39:34 ERROR: on_listen number of fields in message 3
mgmtd[26192]: 2007/09/09_22:39:34 ERROR: on_listen the message itself: login
hacluster
secretpassword
...

There's a field missing, MGMT_PROTOCOL_VERSION.  What happend to this
field? I don't  know.  The code in the receiving message side looks
good (mgmt_session_recvmsg(...) in mgmt_client_lib.c) The code in the
sending side I don't know where it is.

As for my poor man's solution it is to comment the checking of that
fourth field, hopfully it is not used anywhere else, in my simple
tests so far it works flawlessly :

...
 if (msg == NULL || STRNCMP_CONST(args[0], MSG_LOGIN) != 0) {
...
/* if (STRNCMP_CONST(args[3], MGMT_PROTOCOL_VERSION) != 0) {
...
} /*

Now I can connect using hb_gui, and added an IP resource.

If any of the developers discovers what the real problem is and a
better solution I'll be happy to apply it.

I sure will be back soon with more questions ;-)

Thanks for your help



More information about the Linux-HA mailing list