[Linux-HA] mgmtd and SLES9

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Sep 11 05:47:41 MDT 2007


Hi,

On Mon, Sep 10, 2007 at 12:38:26AM +0200, Jose Jerez wrote:
> Hello again,
> 
> I found the problem and a possible solution, although not the best
> one. I'll explain myselft
> 
> On 9/7/07, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> > Hi,
> >
> > On Fri, Sep 07, 2007 at 10:10:42AM +0200, Jose Jerez wrote:
> > > Hello,
> > >
> > > I'm trying to compile heartbeat 2.1.2 in SLES9 SP3 on Power, but I got
> > > an annoying problems with mgmtd/hb_gui
> > >
> > > When connecting to heartbeat using hb_gui the following error pops up:
> > > "Failed in the authentication. User Name or Password may be wrong or
> > > the user dons't belong the the haclient group"
> > >
> > > The password is right, I can connect using ssh, and the user hacluster
> > > belongs to the haclient group
> > >
> > > # id hacluster
> > > uid=90(hacluster) gid=90(haclient)
> > > groups=90(haclient),14(uucp),16(dialout),17(audio),33(video)
> > >
> > > This error message appears in the log file:
> > >
> > > mgmtd[5656]: 2007/09/04_09:57:37 ERROR: on_listen receive login msg failed
> > >
> > > No other error messages appear in the logs
> > >
> > > Heartbeat seems to start fine, and the CIB is empty, no resources or
> > > nodes configured yet.
> > >
> > > The system is SLES9 SP3 on Power (ppc64) and the Heartbeat version is
> > > 2.1.2 compiled from source.
> > >
> > > I compiled the same source on a SLES10SP1 on i586 and didn't find the
> > > problem, at least not after changing the /etc/pam.d/hbmgmtd to use pam_unix2.so
> > >
> > > So my guess here is that some libraries in SLES9 are too old for
> > > mgmtd.  Am I right?
> >
> > No, I don't think so.
> >
> I wouldn't be so sure.
> 
> > > Is it actually possible to have this combination of SLES9 and
> > > heartbeat 2.1.2 working?
> >
> > Yes, it should work.
> >
> > > I'll keep on trying for a solution to this, otherwise I might have to
> > > use HACMP and I don't like that a bit.
> >
> > HACMP? Really?
> 
> Yeah I'm afraid so, not that I like it but these guys came over and
> convinced the bosses it was a good solution for our system (cluster
> with SLES9 SP3 on Power + oracle 10), and not only HACMP but also
> GPFS.  So now is my turn to prove that heartbeat is a better solution,
> and the fact that we have been using it for more than 3 years doesn't
> seem enough of a proof.
> >
> > > Any hints to solve this will be appreciate
> >
> > This is most probably a pam problem, but most probably a not very
> > obvious one. Since your ssh connection works, could you try
> > replicating the pam configuration for ssh to hbmgmtd. Just the
> > part which makes sense, of course (auth and account).
> >
> > Otherwise, you could try debuging pam. I suppose that there's a
> > way to do that on SLES9.
> >
> No, it's not a pam problem, I tried the pam_permit.so module which is
> like an open door (any password is accepted) and the problem didn't go
> away.
> 
> > Interestingly, on my opensuse 10.2 pam_unix.so works fine.
> >
> > Thanks,
> >
> > Dejan
> >
> So, I dig into the code and find that the error message comes from
> mgmtd.c and the function on_listen(...), the block of code starts
> with:
> 
> if (msg == NULL || num != 4 || STRNCMP_CONST(args[0], MSG_LOGIN) != 0) {
> 
> I add the following
> 
>  mgmt_log(LOG_ERR, "%s number of fields in message %d", __FUNCTION__,num);
>  mgmt_log(LOG_ERR, "%s the message itself: %s", __FUNCTION__,msg);
> 
> Now when I try to connect this is what shows up in the error log:
> 
> ...
> mgmtd[26192]: 2007/09/09_22:39:34 ERROR: on_listen number of fields in message 3
> mgmtd[26192]: 2007/09/09_22:39:34 ERROR: on_listen the message itself: login
> hacluster
> secretpassword
> ...
> 
> There's a field missing, MGMT_PROTOCOL_VERSION.  What happend to this
> field? I don't  know.  The code in the receiving message side looks
> good (mgmt_session_recvmsg(...) in mgmt_client_lib.c) The code in the
> sending side I don't know where it is.

Thanks for digging and figuring this one out. Apparently, there
was a protocol version check introduced, but old clients never
supply a protocol version. So, the server just refuses to
connect. There's a patch now in the dev hg
http://hg.linux-ha.org/dev/rev/d60f8af85ec0 which should fix
this.

> As for my poor man's solution it is to comment the checking of that
> fourth field, hopfully it is not used anywhere else, in my simple
> tests so far it works flawlessly :
> 
> ...
>  if (msg == NULL || STRNCMP_CONST(args[0], MSG_LOGIN) != 0) {
> ...
> /* if (STRNCMP_CONST(args[3], MGMT_PROTOCOL_VERSION) != 0) {
> ...
> } /*

It seems like there is only one protocol version so far, so that
check is still superfluous.

> Now I can connect using hb_gui, and added an IP resource.
> 
> If any of the developers discovers what the real problem is and a
> better solution I'll be happy to apply it.

Yes, please.

Cheers,

Dejan

> I sure will be back soon with more questions ;-)
> 
> Thanks for your help
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list