[Linux-HA] ocfs2 ra

Andrew Beekhof beekhof at gmail.com
Wed Oct 18 01:30:55 MDT 2006


On 10/18/06, Slawomir Mroczek <s.mroczek at wasko.pl> wrote:
> On Tue, 17 Oct 2006 17:05:46 +0200
> Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
>
> > Hi,
> >
> > 1. The ocfs2 RA reports a generic error if it can't find the fs
> > UUID, which prevents configurations like this one:
> >
> > <clone id="sharedfs" globally_unique="false">
> >       <instance_attributes id="sharedfs_inst_attr">
> >               <attributes>
> >                       <nvpair id="sharedfs_attr_1" name="clone_max"
> > value="3"/> <nvpair id="sharedfs_attr_2" name="clone_node_max"
> > value="1"/> </attributes>
> >       </instance_attributes>
> >       <group id="ocfs2-fs">
> >       <primitive class="ocf" id="LVM_1" provider="heartbeat"
> > type="LVM"> <operations>
> >                       <op id="LVM_1_mon" interval="120s"
> > name="monitor" timeout="60s"/> </operations>
> >               <instance_attributes id="LVM_1_inst_attr">
> >                       <attributes>
> >                               <nvpair id="LVM_1_attr_0"
> > name="volgrpname" value="data01vg"/> </attributes>
> >               </instance_attributes>
> >       </primitive>
> >       <primitive class="ocf" id="Filesystem_1" provider="heartbeat"
> > type="Filesystem"> <operations>
> >                       <op id="Filesystem_1_mon" interval="120s"
> > name="monitor" timeout="60s"/> <op id="Filesystem_2_notify"
> > interval="120s" name="notify" timeout="60s"/> </operations>
> >               <instance_attributes id="Filesystem_1_inst_attr">
> >                       <attributes>
> >                               <nvpair id="Filesystem_1_attr_0"
> > name="device" value="/dev/data01vg/v0"/> <nvpair
> > id="Filesystem_1_attr_1" name="directory" value="/data/d1"/> <nvpair
> > id="Filesystem_1_attr_2" name="fstype" value="ocfs2"/> </attributes>
> >               </instance_attributes>
> >       </primitive>
> >       </group>
> > </clone>
> >
> > While probing, the device holding the filesystem is not available,
> > etcetera. Is this wrong usage? Or how should we go about that?
>
> OCFS2 RA is based on idea of moving ocfs2 control to user space. In
> other words OCFS2 RA wants to take all control of OCFS2 cluster. To
> make this happend you should get ocfs2 user space patches.
> However I think it's not a good idea. First, you will loose any kind of
> OCFS2 certifications,

Possible, however this is the supported configuration on SLES10 so
must in some way be approved of by oracle.

> second, OCFS2 works fine and there is no need to
> mess with it

Not true.  You have a big problem if heartbeat and ocfs2 disagree
about who is (and is not) part of the cluster.

By having OCFS2 accept membership information from heartbeat you
eliminate this point of failure.

> - you loose its fencing solution or even loose your
> data when something bad happend.

Again, not true.
The heartbeat fencing is generally superior to the one built into
OCFS2 which relies (if I understand it correctly) on node suicide.

> Just remove
> <nvpair id="Filesystem_1_attr_2" name="fstype" value="ocfs2"/>
> and fs will be mounted fine. To say more, I think you should consider
> to change stop funcion in Filesystem RA i.e. like this:
>
> for sig in SIGTERM SIGTERM SIGTERM SIGKILL SIGKILL SIGKILL; do
>  ->       if [ "$SUB" == "/data/d1" ] ; then
>  ->         rc=$OCF_SUCCESS
>  ->         ocf_log info "$SUB will be left mounted"
>  ->         break
>  ->       fi
>         if $UMOUNT $umount_force $SUB ; then
>           rc=$OCF_SUCCESS
>           ocf_log info "unmounted $SUB successfully"
>           break
>         else
>
> IMHO, there is NO need to umount ocfs2 volume while it is alredy
> mounted. This is cluster fs, right? OCFS2 will be fine, trust me :)
> Handling LVM volumes with ocfs2 is also not a good choise. In my humble
> opinion of course. The better way is to modify start function in
> Filesystem RA again like this:
>
>             if [ $? != 0  ] ; then
>                 ocf_log err "Couldn't find filesystem $FSTYPE
> in /proc/filesystems" return $OCF_ERR_ARGS
>             fi
>         fi
>
>         # FIXME!
>   ->      LVM_VG=`echo $DEVICE | /usr/bin/cut -d '/' -f 3`
>   ->      `/usr/bin/which vgchange` -ay $LVM_VG
>
>
>         # Check the filesystem & auto repair.
>         # NOTE: Some filesystem types don't need this step...
>
> And delete LVM RA form CIB. I know it's a bit confusing, but I don't
> think you would like to find LVM fighting with OCFS2 do disable some
> volume. While in production environment I don't think you want play
> with LVM volumes, also.
> Be carefull with HA clones - just test your configurations even with
> node reboots, before you run it for good.
>
> Good luck! :)
>
> --
> Sławomir Mroczek
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list