[Linux-HA] Documentation of heartbeat protocol
dfrincu at streamwide.ro
Thu Oct 14 07:26:17 MDT 2010
Vadym Chepkov wrote:
> On Oct 14, 2010, at 4:41 AM, Lars Ellenberg wrote:
>> If you happen to be somehow target locked on heartbeat, tell us why,
>> and what you are trying to achieve, and we figure something out.
> Sorry for barge in, but I actually started with corosync, but had to "backout", so to speak.
> The major reason - lack of support for PPC architecture, it just doesn't work there.
> I was hoping since RedHat fully supports this platform things will get better with RHEL6,
> but to my unpleasant surprise instead of fixing it, they just decided to not build corosync on anything but Intel.
> Redundant rings also don't work in corosync yet and "bonding" suggested as workaround won't save you from a switch failure.
> I, personally, always add direct link between two modes for redundancy.
I've also noticed a failure in the rrp_mode:passive in
corosync-1.2.7-1.1.el5.x86_64.rpm. The expected behavior as per the docs
/RRP can have three modes (rrp_mode): if set to active, Corosync uses
all interfaces actively. If set to passive, Corosync uses the second
interface only if
the first ring fails. If rrp_mode is set to none, RRP is disabled. With
RRP, two physi-
cally separate networks are used for communication. In case one network
cluster nodes can still communicate via the other network./
So the logic is, on passive mode it uses the first network (ringnumber
0) by default, if it fails, it goes to the second one. Now, the failure
type in my test was the removal of the cable from the network card on
the primary node, at which point it didn't switch to the second
available ring, it went into a situation where node 1 thinks it's
primary, has drbd partitions mounted, node 2 thinks it's alone, switches
to primary and tries to do the same with the drbd partitions (which are
linked between servers on the second network connection, ringnumber 1),
however it fails, since drbd is primary and mounted on the other node.
My solution was to switch to rrp_mode: active, then when performing the
same test it worked the way it should.
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA