[Linux-HA] speed up fail over time
Andrew Beekhof
beekhof at gmail.com
Fri Jul 11 00:52:08 MDT 2008
On Jul 10, 2008, at 1:16 PM, Junko IKEDA wrote:
> Hi,
>
> We are now trying to show a good performance report to the potential
> customer.
> Our customer's requests are here;
> * There are more than 100 resources on one node.
> * 100 resources are included in one group, so they would start/stop
> sequentially.
> * Fail over for all of 100 resources should complete within 1 minute.
thats less than a second per resource (since members of a group are
started sequentially)... is your resource capable of starting so
quickly?
in truth, i think that for group that size, 1 minute is an unrealistic
deadline (assuming its not just full of Dummy resources)
> * Heartbeat stable 2.1 (maybe release as 2.1.4, soon)
> It took about 4 minutes for fail over.
>
> * Heartbeat-dev(5072025b79b8) + Pacemaker-0.7(ee6832884524)
> It took about 3 minutes for fail over.
> It's getting better!
> Is this some effect of the new xml parser?
possible - but more likely the performance optimization i've been
doing over the last couple of weeks and months.
did you cause the DC fail or another node?
because the load spikes generated by electing a new DC have been
reduced by 70-80% (no, thats not a typo)
and before you ask, no, these changes will never be part of 2.1.x
> hb_report are so huge, I created the bugzilla as enhancement.
> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1935
>
> Do you have any good idea to speed up fail over time?
split the group up :)
>
> It would be best if the performance improvement is available with
> Heartbeat
> 2.1.4.
> I know this kind of performance improvement is not so easy, But this
> is a
> matter of the greatest urgency If it comes in after the nearest
> release, we
> are planning to backport it to 2.1.4 for our customer individually.
i think you'll find that is an extremely non-trivial task
More information about the Linux-HA
mailing list