[Linux-HA] Documentation of heartbeat protocol

Dan Frincu dfrincu at streamwide.ro
Thu Oct 14 07:26:17 MDT 2010


Vadym Chepkov wrote:
> On Oct 14, 2010, at 4:41 AM, Lars Ellenberg wrote:
>> If you happen to be somehow target locked on heartbeat, tell us why,
>> and what you are trying to achieve, and we figure something out.
> Sorry for barge in, but I actually started with corosync, but had to "backout", so to speak.
> The major reason - lack of support for PPC architecture, it just doesn't work there.
> I was hoping since RedHat fully supports this platform things will get better with RHEL6, 
> but to my unpleasant surprise instead of fixing it, they just decided to not build corosync on anything but Intel.
> Redundant rings also don't work in corosync yet and "bonding" suggested as workaround won't save you from a switch failure.
> I, personally, always add direct link between two modes for redundancy.
I've also noticed a failure in the rrp_mode:passive in 
corosync-1.2.7-1.1.el5.x86_64.rpm. The expected behavior as per the docs 
should be:

/RRP can have three modes (rrp_mode): if set to active, Corosync uses
all interfaces actively. If set to passive, Corosync uses the second 
interface only if
the first ring fails. If rrp_mode is set to none, RRP is disabled. With 
RRP, two physi-
cally separate networks are used for communication. In case one network 
fails, the
cluster nodes can still communicate via the other network./

So the logic is, on passive mode it uses the first network (ringnumber 
0) by default, if it fails, it goes to the second one. Now, the failure 
type in my test was the removal of the cable from the network card on 
the primary node, at which point it didn't switch to the second 
available ring, it went into a situation where node 1 thinks it's 
primary, has drbd partitions mounted, node 2 thinks it's alone, switches 
to primary and tries to do the same with the drbd partitions (which are 
linked between servers on the second network connection, ringnumber 1), 
however it fails, since drbd is primary and mounted on the other node.

My solution was to switch to rrp_mode: active, then when performing the 
same test it worked the way it should.



> Vadym
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

Systems Engineer
Streamwide Romania

More information about the Linux-HA mailing list