HA Desktop Server

Matthew Tedder matthew at tedder.com
Mon Feb 10 14:56:31 MST 2003


On Sunday 09 February 2003 23:56, Tom Brown wrote:
> On Mon, 10 Feb 2003, Alan Robertson wrote:
> > Matthew Tedder wrote:
> > > Hi,
> > >
> > >     I am inquiring as to how far one can go in terms of building a HA
> > > desktop server.  The biggest thing I am looking for is keeping desktop
> > > applications up when one server goes down....  That is, using X.
> > >
> > >     Is that possible?  It seems not, but I just thought I'd ask.. And
> > > if not, how close can you get?  The users of clients (LTSP-clients) in
> > > this case require X, NFS, DHCP, PXE, and Routing capabilities.  But
> > > they also need reliable access to their data in their home directories
> > > as well as what they are working on, at any given moment in an
> > > application program.  Frequent auto-saves are what we currently use
> > > along with mirrored -R /home/* but can we do more than this??
> > >
> > >     I don't see anything in your project description that directly
> > > addresses this..  It seems more for connectionless services.... Not
> > > even connection-oriented services like X, is this accurate?
> >
> > No one has ever asked this before.  The closest was a joke about making a
> > high-availability laptop by duct-taping two together back to back, and
> > mirroring the disks across ethernet, and then "failing over" the laptop
> > by flipping it over.
> >
> > Saving a general desktop environment without restarting is a VERY
> > interesting problem.  MUCH more state than simply TCP, or even TCP and X
> > needs to be saved.  Basically every application needs to save its state
> > after every keystroke or mouse movement, etc, so that the display saved
> > matches the application state.
>
> I probably shouldn't comment because I know so little about the actual
> guts of this, but I did do a little bit of research (e.g. co-op work term
> report) on tandem "non-stop" systems about a decade ago... basically, yes,
> you need to "checkpoint" the applications periodically, so you can resume
> execution should the CPU/server/whatever that the application is running
> on die.

Maybe it would work if it checkpointed and mirrored between GUI events, such 
as mouse clicks and keystrokes?  Or when the event handlers for those are 
finished?  

If video memory can be mirrored with tools like VNC, then it seems a lot 
simpler and--with this checkpoint strategy--to be less frequent.  The 
mirroring would therefore take up very little bandwidth.  VNC looks for 
certain areas of video memory (rectangular areas) that contain changes.  It 
uses checksums to see if the areas have changed.

>
> This probably is not a popular strategy in a world used to commodity
> hardware, since you'd have to copy the application memory to another
> server and that would slow everything down. In the Tandem world, there was
> a lot of redundancy, and the cost of copying memory would have been a lot
> lower because it would have just been going to either _local_ disk or a
> different memory bank. And things like TCP state saving wouldn't have been
> such a big deal, because the node taking over the application would have
> been just a different CPU on the same computer/kernel...
>
> My understanding is the heartbeat is a _lot_ simpler than that :-)
>
> We have some archaic (quad ppro 200) 4th hand compaq equipment that has
> some high-bandwidth interfaces that may have been used at one time for
> something like that.
>
> -Tom

-- 
Matthew C. Tedder
SimpFlex Technologies, Inc.



More information about the Linux-HA mailing list