[Linux-HA] Messages sent through heartbeat are not received.
Alan Robertson
alanr at unix.sh
Thu Apr 1 07:54:20 MST 2004
Horms wrote:
> On Wed, Mar 31, 2004 at 06:33:38PM -0600, Steve Dobbelstein wrote:
>>I work on the Enterprise Volume Management System (EVMS) which has a
>>plug-in to support clustering features under Linux-HA. One of our test
>>clusters is running fine with heartbeat 1.1.3. I figured we should move up
>>to the latest heartbeat, so I install 1.2.0 on another test cluster.
>>
>>The EVMS HA plug-ins fail to start up on 1.2.0. I installed the latest CVS
>>code and am getting the same results.
>>
>>Debugging the problem further, I am finding that messages sent by the
>>plug-in on one node through heartbeat are not being received on the other
>>node. The plug-in uses
>>heartbeat_handle->llc_ops->sendnodemsg(heartbeat_handle, msg, node);
>>(heartbeat_handle is what was returned from ll_cluster_new("heartbeat");)
>>That succeeds, but I don't see a callback for delivery of the message on
>>the other node.
>>
>>I realize this description is very sketchy. I'm not sure what kind of
>>information one needs to debug this problems. I will be happy to provide
>>more information (configuration files, logs, test runs, etc.).
>
>
> The plugin implementation changed significantly between 1.1.X and 1.2.X.
> You will need to update your plugin accordingly.
What it amounts to is that certain misuses of the interface were "harmless"
in 1.1.x, and became fatal in 1.2.x.
In particular, this would work in 1.1.x:
while (select() > 0) {
read a message
}
but this won't in 1.2 because there is buffering in the messaging scheme,
where there was none in the 1.1 version.
The proper way to use the interface (before and now) is:
while (select() > 0) {
while (is_message_pending()) {
read a message
}
}
[Of course, this is an outline of the real code, but this should give you
the right idea].
This has been discussed extensively on the linux-ha-dev (development)
mailing lists. If you are absolutely unable to tolerate input buffering,
then there is a way to make the first input form work, by telling the IPC
layer to not buffer input. But, this should be avoided if at all possible
- particularly if you're sending large quantities of data.
If you are using the mainloop code, then there is a new call you'll want to
use to return the ipc channel. There is a new function called ipcchan()
which returns the IPC channel. You can still get the file descriptor like
before, but if you're using mainloop input sources you'll want to switch
from G_main_add_fd() to G_main_add_IPC_Channel(), and feed it the return
from the ipcchan() function.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me claim
from you at all times your undisguised opinions." - William Wilberforce
More information about the Linux-HA
mailing list