[Linux-HA] configuring CTS - (Was: Wrong includes when using
colocations with groups)
Marian Neubert
linux-ha at tesla-crew.de
Wed Oct 17 11:06:36 MDT 2007
arrgh... no logs attached, sorry
> Hi Andrew,
>
> Andrew Beekhof schrieb:
>> small item: please include outputs as attachments rather than inline
>> text.
> hmm, i'm sure i've attached the logs...
>
>>> According to andrews "colocation explained"-pictures "R_mysqld" and
>>> "R_apache"'s location-score 500 for the node "test1" should be included
>>> in the groups.
>>
>> that document indicates that it only applies to 2.1.2-4 and later
>> colocation got a major overhaul after 2.1.2 went out.
>> try grabbing an interim build and retesting.
>
> ok, that tiny "-4" should explain the "false" behaviour with groups ;o)
>
> i'm continuing my tests with an interim build - but i'm not willing to
> run such an unreleased version in a production-environment :o(
>
> (better i use no groups at all and configure everything as single
> resources with some more order- and colocation-constraints...)
>
>
> BTW: how do i configure the CTS correctly?
>
> when i explicitely enable stonith-testing with the command:
>
> python ./CTSlab.py -v2 --stonith yes --standby yes 5 > /tmp/cts_output
> 2>&1 &
>
> then none of these tests are executed (see attached logfile).
>
> stonith is enabled and configured to reboot failed nodes. a
> stonith-clone with "apcmastersnmp" is configured, running and working
> (tested earlier by hand). startup-fencing is enabled.
>
> everything works fine except that the test dindn't run the stonith-tests?
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
-------------- next part --------------
Oct 16 03:29:38 Random seed is: 1192498178
Oct 16 03:29:38 >>>>>>>>>>>>>>>> BEGINNING 5 TESTS
Oct 16 03:29:38 HA configuration directory: /etc/ha.d
Oct 16 03:29:38 System log files: /var/log/ha-log-local7
Oct 16 03:29:38 Enable Stonith: 1
Oct 16 03:29:38 Enable Fencing: 1
Oct 16 03:29:38 Enable Standby: 1
Oct 16 03:29:38 Enable Resources: 0
Oct 16 03:29:38 Cluster nodes:
Oct 16 03:29:38 * test1: dc4d8031-24b5-428e-9fe7-dc0854ff8db3
Oct 16 03:29:39 * test2: a7d43dc9-b6ce-4a87-b24e-77dc9e54d18a
Oct 16 03:29:40 Stopping Cluster Manager on all nodes
Oct 16 03:29:41 Starting Cluster Manager on all nodes.
Oct 16 03:30:47 Waiting for node test2 to come up
ssh: connect to host test2 port 22: Connection refused
ssh: connect to host test2 port 22: Connection refused
ssh: connect to host test2 port 22: Connection refused
ssh: connect to host test2 port 22: Connection refused
ssh: connect to host test2 port 22: Connection refused
ssh: connect to host test2 port 22: Connection refused
Oct 16 03:32:15 Node test2 now up
Oct 16 03:32:16 test2 was already started
Oct 16 03:32:47 BadNews: Oct 16 03:30:30 test1 test1 tengine: [21835]: WARN: update_failcount: Updating failcount for R_ipaddr01 on dc4d8031-24b5-428e-9fe7-dc0854ff8db3 after failed monitor: rc=14
Oct 16 03:33:01 Running test RestartOnebyOne (test1) [1]
Oct 16 03:35:07 BadNews: Oct 16 03:34:07 test2 test2 tengine: [4750]: WARN: update_failcount: Updating failcount for R_ipaddr01 on dc4d8031-24b5-428e-9fe7-dc0854ff8db3 after failed monitor: rc=14
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: NV failure (string2msg_ll):
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: Input string: [>>> t=cib cib_clientid=2e610bf6-95ac-4b6f-bd11-36aa3628fe7d cib_callopt=1048576 cib_callid=33 cib_op=cib_update cib_section=status cib_clientname=23941 (4)cib_calldata=eJzNWduOozgQfZ/faI00I8HId5u0eNiXlfZtNT+AfCPNbIAMkO7tv18DTtKQdLdzQdqXKLKrypdzTlEFDw8PXyBdfQPfs6ySpc2yVBcqQhStvqHvuq7yYr1rZFfUVfrgbNHM9q1BRMTg1JTZOD56oKnHYTb6+vXrlwgPPlVtbDuYQz4xHyZGS8J6y8a29a7R3hrBifVhcvSgwB+i7RpZVJ33wfMj7KdHr+GH02RwbjvZ7fzOxMRvnHGGhOxPkPVj9typj7MRGWYKkxpNjAAYxogoGhMkbJzklsdGA0FJngujcARJb73rw6SdbTsYwcH/SaZSd8Wz9RZFlWldpl2z24+4izZpXW2Kaj/yqy6qtLSlsk00Hsb+u7W6s2Y/ipH3jI1Vu3VcN8Xa+Zg62zjYfu9s8xrB4frap11n6pcqBT54Y7cbqW3qDCOOGO7vxP0fb25Kmt7kslvgkHMfMJsR4CT00SBKBJx7jU7kXSd/MW5jPzPTKJOVr+3vTQSTfrR73dq0H43botxurMdCb2TbprXOo/Gg26Z+Loxt0qJsIwLFYQ+tzurtOXocZyN0bv2srKuiq5sMRAj08/XWelH6mQizs9CpXbEx2UiV7OcfbUSH+I7wVVv0AbJ/7GuarNAKUQ6SnJIY5kzEBGIRKyB1TJQUmCnLqCIRIzPvUq4LnZIVfwyO4S9NbjaZOyaL0J6uWW6dpBonFNul8Af4kXjbRruUYWzKPdfqbeZ1STwdnXpt8yw3jo6EehNTrJ1gUm0kdItLQKWAQqjE5IYba1
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: sp=clienf78e030f308f498611c524587a4bbb6b <<<
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: depth=0
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG: Dumping message with 11 fields
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[0] : [t=cib]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[1] : [cib_clientid=2e610bf6-95ac-4b6f-bd11-36aa3628fe7d]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[2] : [cib_callopt=1048576]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[3] : [cib_callid=33]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[4] : [cib_op=cib_update]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[5] : [cib_section=status]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[6] : [cib_clientname=23941]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[7] : [(4)cib_calldata=0x81afb90(1741 1309)]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[8] : [cib_delegated_from=test1]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[9] : [from_id=cib]
Oct 16 03:35:07 BadNews: Oct 16 03:34:14 test2 test2 heartbeat: [3629]: ERROR: MSG[10] : [to_id=cib]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: ha_msg_addraw_ll: illegal field
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: ha_msg_addraw(): ha_msg_addraw_ll failed
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: NV failure (string2msg_ll):
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: Input string: [>>> t=cib cib_clientid=8775eb2e-8f09-479e-b947-f61db2396a90 cib_callopt=1048576 cib_callid=79 cib_op=cib_apply_diff cib_section=status cib_clientname=3717 (4)cib_calldata=eJyVzsEKgzAMBuC7r1EEhRZsjGsn9CX2AqHaDAZToXX4+hMnjDl22CWHP/k/IoTIANqiKolGPzCRG6fAlGY/s8RtcwvOm4B16M+qO/Ws0FujOkBWxqwhNxi09RJ03RZQ3uPgxMrq5oNd4z89DXb3KHKaHrHntMnwJb8PpKmOpVcHf3akxv2tC10TLcsi8zzPDuMJ+khVtg== cib_update=true (4)cib_update_diff=eJzNVUtu4zAM3fcaQYEGkAv9LbvQYjZzgF5AkCU6FRrbgeyk6O1Hjh148hkgnS5mNgZEPtKPfJS4Wq0eiCyf8NqY1jZgjPahrhFRRbLS9XjIIjTdAbxeJTDlV+CTH5Fc4fJJrF2ojlgizrDJnPKOpnbfmP3O2wF6TXKSAke7WPeDHfb9FKvOYicPIjKnI7DtPJjRBhMpegZevIiIKWAbm1uUkhkRruQMMRH6bh8dTBzoFXgBpDB6FXarQ7/7U9DoC16/mro3Hx8fc0OGzx3on2EL/Wc/QOJ0LMhtbd/rztWIkvG8i90heIj6DWwcKrAD4mNGOpHonel2t/qxeBG9JGCarg1DFw1GFI++bgfRDqFr9exB7DgfLjaZh2q/yboYNqHV1T5svbFuCAcwrz96JKZSom37MCYw7/CpZclKhzEm2JFMWlZkvKBFpioGmWKFxRhqRSggyS+iG7sJTvMyf7k7x9w2u92aVCKfq03MTZ2atY9pKmDQ5Bk/FzM2OuPStOh8lqbbmXkIOSLs2Kt2gHiwW40RFz
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: sp=>>> t=cib cib_clientid=8775eb2e-8f09-479e-b947-f61db2396a90 cib_callopt=1048576 cib_callid=80 cib_op=cib_apply_diff cib_section=status cib_clientname=3717 (4)cib_calldata=eJzNWduOozgQfZ/faLU0I8HId0NaPOzLfsD8APKNNLsBMkC6t/9+DThJQ5hZk4C0L1FkVxV2nTrHLnh6evoC6e4r+JampShMmiYqlwGiaPcVfVNVmeX7Uy3avCqTJ2uLJrafDQIS9U51kQ7jgwcae1xmg+fn5y8B7n3KSpumN4d8ZN5PDJaEdZa1aapTrZw1giPry+TgQYHbRNPWIi9b54OnWzhPD179D+Okz0DTivbkVhaN/IaZgLGYnXeQdmNmbtfX2YD0M7lOBNcEaxWHkikTEhHxUCJiQs7toKFEw0gEkHTWpy5M0pqmRQHs/V9FIlSbvxlnkZepUkXS1qfziE20TqrykJfnkb+qvEwKU0hTB8NmzD9Ho1qjz6MYOc9QG3nah1Wd762PrtKDhe3nydQfAezT17yeWl29lwkIGMG4y4A1GfI0LhE7vHDPDBPuAqYTuG9CXw2CmN94DU7kl04uDXZhP1JdS50WH83PQwDjbrT9OJqkGw2bvDgejMu8OoimSSqVBcNGj3X1lmtTJ3nRBARGlzU0Kq2Oc8VwnQ3Q3PPToirztqpTECDQzVdH4yjoZgLMZoGSp/yg06Ew0h9/NAHt49vyLpu8C5D+bT6SaId3CgAAgYIhEzgOSYziMJLYhBGOBQAmiyAyFtmJdyH2uUrIjr94x3BJE4dDarfJAnQuzjQzlkC1pYVpE/gdfI+dba2sQGiTcFe21TF1LCSu+CxXTf0mDrb4CHUmOt9beiRKCxhhJgAVEYwiGetMc200iykCElInJZAvg4nNwWRXVb
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: depth=0
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG: Dumping message with 23 fields
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[0] : [t=cib]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[1] : [cib_clientid=8775eb2e-8f09-479e-b947-f61db2396a90]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[2] : [cib_callopt=1048576]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[3] : [cib_callid=79]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[4] : [cib_op=cib_apply_diff]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[5] : [cib_section=status]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[6] : [cib_clientname=3717]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[7] : [(4)cib_calldata=0x8120208(221 167)]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[8] : [cib_update=true]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[9] : [(4)cib_update_diff=0x8122cf0(892 673)]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[10] : [oseq=28]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[11] : [from_id=cib]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[12] : [to_id=cib]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[13] : [client_gen=4]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[14] : [src=test2]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[15] : [(1)srcuuid=0x8145ac8(36 27)]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[16] : [seq=b9]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[17] : [hg=4710a9b4]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[18] : [ts=47141506]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[19] : [ld=0.52 0.60 0.26 11/107 5404]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[20] : [ttl=4]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[21] : [_compression_algorithm=zlib]
Oct 16 03:35:07 BadNews: Oct 16 03:34:15 test1 test1 heartbeat: [23924]: ERROR: MSG[22] : [auth=2 4cf5c02c62224a0af93d879cdc4fac231<<<]
Oct 16 03:35:23 Running test StopOnebyOne (test2) [2]
Oct 16 03:36:10 Running test Flip (test2) [3]
Oct 16 03:37:15 BadNews: Oct 16 03:37:00 test2 test2 tengine: [7612]: WARN: update_failcount: Updating failcount for R_ipaddr01 on a7d43dc9-b6ce-4a87-b24e-77dc9e54d18a after failed monitor: rc=14
ssh: connect to host test1 port 22: No route to host
Oct 16 03:37:34 Running test SimulStop (test1) [4]
ssh: connect to host test1 port 22: No route to host
ssh: connect to host test1 port 22: No route to host
ssh: connect to host test1 port 22: No route to host
Oct 16 03:40:08 BadNews: Oct 16 03:39:23 test2 test2 tengine: [7612]: WARN: update_failcount: Updating failcount for R_ipaddr01 on dc4d8031-24b5-428e-9fe7-dc0854ff8db3 after failed monitor: rc=14
Oct 16 03:40:09 Running test standby2 (test1) [5]
Oct 16 03:42:26 BadNews: Oct 16 03:40:54 test2 test2 tengine: [9890]: WARN: update_failcount: Updating failcount for R_ipaddr01 on dc4d8031-24b5-428e-9fe7-dc0854ff8db3 after failed monitor: rc=14
Oct 16 03:42:42 Stopping Cluster Manager on all nodes
Oct 16 03:43:09 BadNews: Oct 16 03:42:50 test2 test2 tengine: [9890]: WARN: update_failcount: Updating failcount for R_ipaddr01 on a7d43dc9-b6ce-4a87-b24e-77dc9e54d18a after failed monitor: rc=14
Oct 16 03:43:10 ****************
Oct 16 03:43:10 Overall Results:{'failure': 0, 'success': 5, 'BadNews': 52}
Oct 16 03:43:10 ****************
Oct 16 03:43:10 Detailed Results
Oct 16 03:43:10 Test Flip: {'elapsed_time': 64.633625984191895, 'skipped': 0, 'calls': 1, 'success': 1, 'started': 1, 'down->up': 1, 'auditfail': 0, 'failure': 0, 'max_time': 64.633625984191895, 'min_time': 64.633625984191895}
Oct 16 03:43:10 Test Restart: {'auditfail': 0, 'failure': 0, 'skipped': 0, 'success': 0, 'calls': 0}
Oct 16 03:43:10 Test Stonithd: {'auditfail': 0, 'failure': 0, 'skipped': 0, 'success': 0, 'calls': 0}
Oct 16 03:43:10 Test StartOnebyOne: {'auditfail': 0, 'failure': 0, 'skipped': 0, 'success': 0, 'calls': 0}
Oct 16 03:43:10 Test SimulStart: {'auditfail': 0, 'failure': 0, 'skipped': 0, 'success': 0, 'calls': 0}
Oct 16 03:43:10 Test SimulStop: {'elapsed_time': 152.1576931476593, 'skipped': 0, 'calls': 1, 'success': 1, 'auditfail': 0, 'failure': 0, 'max_time': 152.1576931476593, 'min_time': 152.1576931476593}
Oct 16 03:43:10 Test StopOnebyOne: {'elapsed_time': 44.699368953704834, 'skipped': 0, 'calls': 1, 'success': 1, 'auditfail': 0, 'failure': 0, 'max_time': 44.69889497756958, 'min_time': 44.69889497756958}
Oct 16 03:43:10 Test RestartOnebyOne: {'elapsed_time': 123.48592209815979, 'skipped': 0, 'calls': 1, 'success': 1, 'auditfail': 0, 'failure': 0, 'max_time': 123.48539209365845, 'min_time': 123.48539209365845}
Oct 16 03:43:10 Test PartialStart: {'auditfail': 0, 'failure': 0, 'skipped': 0, 'success': 0, 'calls': 0}
Oct 16 03:43:10 Test standby2: {'elapsed_time': 135.8998908996582, 'skipped': 0, 'calls': 1, 'success': 1, 'auditfail': 0, 'failure': 0, 'max_time': 135.8998908996582, 'min_time': 135.8998908996582}
Oct 16 03:43:10 Test ResourceRecover: {'auditfail': 0, 'failure': 0, 'skipped': 0, 'success': 0, 'calls': 0}
Oct 16 03:43:10 Test SpecialTest1: {'auditfail': 0, 'failure': 0, 'skipped': 0, 'success': 0, 'calls': 0}
Oct 16 03:43:10 Test NearQuorumPoint: {'auditfail': 0, 'failure': 0, 'skipped': 0, 'success': 0, 'calls': 0}
Oct 16 03:43:10 <<<<<<<<<<<<<<<< TESTS COMPLETED
Retrying command /usr/bin/ssh -l root -n -x test2 'true'
Retrying command /usr/bin/ssh -l root -n -x test2 'true'
Retrying command /usr/bin/ssh -l root -n -x test2 'true'
Retrying command /usr/bin/ssh -l root -n -x test2 'true'
Retrying command /usr/bin/ssh -l root -n -x test2 'true'
Retrying command /usr/bin/ssh -l root -n -x test2 'true'
Retrying command /usr/bin/ssh -f -l root -n -x test1 '/etc/init.d/heartbeat start > /dev/null 2>&1'
Retrying command /usr/bin/ssh -f -l root -n -x test1 '/etc/init.d/heartbeat start > /dev/null 2>&1'
Retrying command /usr/bin/ssh -f -l root -n -x test1 '/etc/init.d/heartbeat start > /dev/null 2>&1'
More information about the Linux-HA
mailing list