[Linux-HA] grouping cloned resources?
Andrew Beekhof
beekhof at gmail.com
Thu Mar 1 07:24:58 MST 2007
On 2/28/07, Sebastian Reitenbach <sebastia at l00-bugdead-prods.de> wrote:
> Hi andrew,
>
> "Andrew Beekhof" <beekhof at gmail.com> wrote:
> > On 2/27/07, Sebastian Reitenbach <sebastia at l00-bugdead-prods.de> wrote:
> > > Hi,
> > >
> > > >
> > > > However, you may be better of using individual clones and using
> > > > rsc_order constraints between them - ie, all those filesystems probably
> > > > can be safely started in parallel as long as they are run before the
> > > > application. As this is closer to the way how the cluster was intended
> > > > to be used, it'll probably work much better for you.
> > > >
> > > This I tried first. I have about 20 ocfs2 volumes to mount in a five node
> > > cluster, when I
> > > start up the cluster, fill it with the resources, it takes about 10 minutes,
> and
> > > the
> > > cluster is dead. The mounting takes ages, the cim process runs with 100% on
> each
> > > node, and
> > > the nodes are nearly inresponsive, then I see the timeouts of the dlm, I
> think
> > > because of
> > > the load on the cluster nodes, I have a load of about 8-10, on two processor
> > > dual core
> > > machines.
> > > Instead the three groups we created, they just loaded and were distributed
> > > within seconds,
> > > with nearly no load on the nodes, I would prefer that ;)
> > >
> >
> > can you attach your configuration please?
> >
>
> following is the ha.cf file from /etc directory, appended are the following
> files:
> cibbootstrap.xml, stonithcloneset.xml were used for both, the ocfs2_cloned.xml
> is where
> all volumes are independent, and the ocfs2_grouped.xml contains the grouped
> ocfs2
> configuration. I started the nodes in the cluster with an empty configuration,
> all in
> standby, and then on the DC, I first added the cibbootstrap.xml and the
> stonithcloneset.xml via cibadmin, and then one of the last two files too. Then I
> activated
> the nodes, and experienced as above mentioned.
ok, while not "wrong", it shouldn't be necessary to upload the CIB in stages.
also, activating all the nodes at once means that _all_ the probing
will happen at the same time instead of as each node comes up. so
there is less load distribution.
other than:
s/node_max/clone_node_max/
the config (i looked mostly at the group version) looks sane enough to
at least look like it works.
the monitor intervals look a little aggressive and i'm not sure
exactly how the LRM handles start_delay but in principle it should be
ok.
i had actually hoped you would include the output from cibadmin -Q
which would include the status as well... that would have also told us
what actions/resources were failing.
but in any case, i think we'd need to look at some log files to see
what the cluster is up to, to be creating such a load
>
> autojoin none
> crm true
> node ppsdb101
> node ppsdb102
> node ppsnfs101
> node ppsnfs102
> node ppsbackup101
> cluster ppscluster
> bcast eth1
>
> any comments are highly appreciated.
>
> kind regards
> Sebastian
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
More information about the Linux-HA
mailing list