[Linux-HA] best practices monitoring services in Xen instances

Sebastian Reitenbach sebastia at l00-bugdead-prods.de
Tue Nov 6 07:57:33 MST 2007


Hi Andrew,

"Andrew Beekhof" <beekhof at gmail.com> wrote: 
> On 11/5/07, Sebastian Reitenbach <sebastia at l00-bugdead-prods.de> wrote:
> > Hi,
> >
> > to remove complexity from my cluster, I am experimenting with Xen.
> > Starting and stopping the Xen resources via heartbeat works well 
already.
> > I am a bit concerned about the services in the virtual machines, how is 
the
> > best approach to monitor their availability?
> 
> what you're talking about is basically having the crm manage resources
> on non-cluster nodes.
> 
> we've kicked around some ideas for implementing this in the past but
> its never really bubbled to the top of anyone's todo list.
> 
> there's not really any "best practices" for this as its not really
> being done a whole lot (from what I hear anyway).  depending on how
> complex the relationships between the resources inside the Xen guests
> are, i'd go with option 1 (if they're complex) or 2 (if not)

thank you for your comments. I more or less have to check that the services 
not get killed by the OOM killer, e.g. when i have 3 domU's running, and I 
want to start a 4. node, but I have no free memory, available, then I have 
to shrink the memory of the already running domU's via xen'S mem-set.
But when I do that, it can happen that the OOM killer in the domU will kill 
my services, that the domU is intended to provide. Unfortunately, heartbeat 
has nothing to detect that, yet.
I am just tweaking the Xen resource script. I added a parameter, 
OCF_RESKEY_monitor_scripts, that the Xen resource script will run when the 
monitor action for the domU is called. These custom scripts will test the 
services assigned to the domU, in case one fails, then the whole domU will 
be restarted via heartbeat, and then hopefully get the internal service 
restarted too.

Sebastian
> 
> >
> > I have some solutions, but would like to know what corresponds to best
> > practice:
> >
> > - install heartbeat in the virtual domains too, then monitor the 
resources
> > within the xen instance, but I think this is counterproductive as I 
wanted
> > to remove complexity from the cluster due to having less resources.
> >
> > - monitor the services in the virtual domains using SNMP, or custom 
scripts,
> > and in case sth. fails, crm_resource stop and setart it again. Well, 
custom
> > scripts sounds a bit error prone.
> >
> > - I don't know whether xen has the ability, but does the priviliged 
domain
> > has the ability to query a given domU for the state of a process, and in
> > case the state is just not the wanted one, restart the domU.
> >
> > I think the last one, would be the best, but I have no idea, whether xen 
can
> > do that at all. I played around with OpenVZ for a short while, that at 
least
> > could do that. Any other ideas, comments, rants are very welcome.
> >
> > kind regards
> > Sebastian
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> 



More information about the Linux-HA mailing list