Release Policy (was: Re: [Linux-HA] 2.1.2 and failover of colocated resources)

Peter Clapham pc7 at sanger.ac.uk
Thu Sep 20 02:08:57 MDT 2007


>>> auten taken (apart from the test
>>> suite) that things won't break.
>>>
>> Just to make one thing clear - I would never ever upgrade to a new  
>> version
>> before testing it extensively. on a test system.
>>
>> What i do on a new release in my test environment:
>> I.) Test new version with clean system configuration (on all nodes):
>> - make cluster standby on all nodes
>> - cibadmin -E
>> - shutdown heartbeat on all nodes
>> - manually remove CIB file on all nodes
>> - upgrade to new release on all nodes
>> - start heartbeat on all nodes
>> - load the previous CIB configuration
>> - now test
>
> This is quite good. Perhaps you could write it at linux-ha.org.

Very much so ! Please do :-)


>
>> After all my tests passed I'm at least sure the upgrade cycle had  
>> no impact on
>> the cluster.
>>
>> NOTE: If you have to change CIB to get the old cluster behaviour  
>> your upgrade
>> process will be more painful.
>>
>> II) Testing Upgrade Process:
>>
>> 1) You did not have to change the CIB:
>>
>> a) back to old version:
>> - clean heartbeat (set all nodes standby, cibadmin -E, hertbeat  
>> stop, delete
>> CIB file on all nodes, remove new heartbeat version)
>> - install old heartbeat version on all nodes
>> - start heartbeat on all nodes
>> - load CIB
>>
>> b) now perform this for all nodes (one after the other):
>> - set cluster to standby
>> - upgrade to new heartbeat
>> - activate node again
>>
>> c) test again ---> if all tests passed "Hurray"
>>
>> 2.) Lests assume you had to change the CIB to get the same cluster  
>> behaviour
>> like you had with the old heartbeat version.
>>
>> Old CIB: cib-old.xml
>> New CIB: cib-new.xml
>>
>> Goal: based on the input of the 2 CIBs you have to find a way to  
>> upgrade
>> heartbeat with the shortes down time. This is not a trival task  
>> because it
>> really depends on the CIB difference and is specific to your  
>> configuration
>> what you can/should do and what not.
>>
>> 1.) standby-upgrade-activate process: this is the most secure way  
>> to upgrade
>> but the cluster downtime may be various minutes
>> - set one node standby
>> - upgrade heartbeat on the standby node
>> - set all nodes to standby (downtime start as soon the last node  
>> is standby)
>> - load cib-new.xml
>> - activate node with new heartbeat version (here you have the  
>> cluster up
>> again)
>> - upgrade heartbeat on all other nodes
>>
>> 2.) set node/resources to unmanaged mode: this means you have a small
>> cib-delta-change and you really know what kind of effect the  
>> change has
>> (ptest is your friend for that).
>>
>> So far i always used version 1) because it seems to be more  
>> robust. I tried a
>> couple times 2) but it sometimes ended that the heartbeat died on a
>> successive stop (but this may be fixed in newer heartbeat version).
>
> I'd actually opt for this one and so far didn't experience
> problems with it. Though I didn't do it so often. The big
> advantage is that the resources don't have to be moved around.
> Please post the logs, etc, in case Heartbeat gives up.

I've also found that on a busy heartbeat system /var/lib/pengine   
needs a clean before performing an upgrade on all nodes... but as  
always YMMV :-)

>
> Thanks.
>
> Dejan
>
>> kind regards Max
>>
>>
>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



Dr Peter Clapham, Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK






-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Linux-HA mailing list