[Linux-HA] crm_failcount queries quite slow?

Dominik Klein dk at in-telegence.net
Fri Apr 4 00:25:31 MDT 2008


Lars Marowsky-Bree wrote:
> On 2008-04-03T13:59:36, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> 
>> Any crm* program is significantly slower on a non-DC node
>> regardless of whether something's happening in the cluster. It's
>> always been like that.

I can confirm that. It's been for me ever since I started using heartbeat.

> Hm, I've not personally observed that in my test cluster, or at least
> not noticed anything out of line.
> 
> "Significantly" slower is bad; we mandate that "DC or not DC" is _not_
> the question, and that users shouldn't care about this designation.
> 
> Could anyone who reproduces this report a few more details? Is it the
> local node, the time it takes to process on the DC, or the network
> roundtrip? (Should be observable using tcpdump/wireshark)

Just 2 measurements:

dktest2sles10:~# time crmadmin -D
Designated Controller is: dktest2sles10

real    0m0.005s
user    0m0.004s
sys     0m0.000s

dktest1sles10:~/cib# time crmadmin -D
Designated Controller is: dktest2sles10

real    0m1.014s
user    0m0.000s
sys     0m0.004s

dktest2sles10:~# time cibadmin -Q &> /dev/null

real    0m0.009s
user    0m0.004s
sys     0m0.004s

dktest1sles10:~/cib# time cibadmin -Q &> /dev/null

real    0m1.713s
user    0m0.004s
sys     0m0.004s

tcpdump:

y.x.z.103 is the DC
y.x.z.102 is the other node

08:22:16.803702 IP 10.200.200.102.32952 > 10.200.200.103.694: UDP, 
length 217
08:22:16.803626 IP 10.250.250.102.32951 > 10.250.250.103.694: UDP, 
length 221
08:22:16.803637 IP 10.250.250.102.32951 > 10.250.250.103.694: UDP, 
length 217
08:22:16.929482 IP 10.250.250.103.32869 > 10.250.250.102.694: UDP, 
length 221
08:22:16.929528 IP 10.200.200.103.32870 > 10.200.200.102.694: UDP, 
length 221

up to here, it's been just the normal heartbeat packets I think. Notice 
the roughly identical length.

Then I do:

debian dktest1sles10:~/cib# date +%H:%M:%S:%N; time cibadmin -Q &> /dev/null
08:22:16:041111482

real    0m1.189s
user    0m0.008s
sys     0m0.00

08:22:16.929976 IP 10.250.250.103.32869 > 10.250.250.102.694: UDP, 
length 2263
08:22:16.930026 IP 10.200.200.103.32870 > 10.200.200.102.694: UDP, 
length 2263
08:22:16.930029 IP 10.200.200.103 > 10.200.200.102: udp
08:22:16.929979 IP 10.250.250.103 > 10.250.250.102: udp

Both servers received an ntpdate sync against the same timesource a 
minute earlier. So to me, it looks like it's the DC who needs some time 
to process the request. The cluster had one primitive resource at that 
time and should have been pretty much idle.

Regards
Dominik


More information about the Linux-HA mailing list