[Linux-HA] best practices monitoring services in Xen instances
Sebastian Reitenbach
sebastia at l00-bugdead-prods.de
Tue Nov 6 07:57:33 MST 2007
Hi Andrew,
"Andrew Beekhof" <beekhof at gmail.com> wrote:
> On 11/5/07, Sebastian Reitenbach <sebastia at l00-bugdead-prods.de> wrote:
> > Hi,
> >
> > to remove complexity from my cluster, I am experimenting with Xen.
> > Starting and stopping the Xen resources via heartbeat works well
already.
> > I am a bit concerned about the services in the virtual machines, how is
the
> > best approach to monitor their availability?
>
> what you're talking about is basically having the crm manage resources
> on non-cluster nodes.
>
> we've kicked around some ideas for implementing this in the past but
> its never really bubbled to the top of anyone's todo list.
>
> there's not really any "best practices" for this as its not really
> being done a whole lot (from what I hear anyway). depending on how
> complex the relationships between the resources inside the Xen guests
> are, i'd go with option 1 (if they're complex) or 2 (if not)
thank you for your comments. I more or less have to check that the services
not get killed by the OOM killer, e.g. when i have 3 domU's running, and I
want to start a 4. node, but I have no free memory, available, then I have
to shrink the memory of the already running domU's via xen'S mem-set.
But when I do that, it can happen that the OOM killer in the domU will kill
my services, that the domU is intended to provide. Unfortunately, heartbeat
has nothing to detect that, yet.
I am just tweaking the Xen resource script. I added a parameter,
OCF_RESKEY_monitor_scripts, that the Xen resource script will run when the
monitor action for the domU is called. These custom scripts will test the
services assigned to the domU, in case one fails, then the whole domU will
be restarted via heartbeat, and then hopefully get the internal service
restarted too.
Sebastian
>
> >
> > I have some solutions, but would like to know what corresponds to best
> > practice:
> >
> > - install heartbeat in the virtual domains too, then monitor the
resources
> > within the xen instance, but I think this is counterproductive as I
wanted
> > to remove complexity from the cluster due to having less resources.
> >
> > - monitor the services in the virtual domains using SNMP, or custom
scripts,
> > and in case sth. fails, crm_resource stop and setart it again. Well,
custom
> > scripts sounds a bit error prone.
> >
> > - I don't know whether xen has the ability, but does the priviliged
domain
> > has the ability to query a given domU for the state of a process, and in
> > case the state is just not the wanted one, restart the domU.
> >
> > I think the last one, would be the best, but I have no idea, whether xen
can
> > do that at all. I played around with OpenVZ for a short while, that at
least
> > could do that. Any other ideas, comments, rants are very welcome.
> >
> > kind regards
> > Sebastian
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
More information about the Linux-HA
mailing list