[Linux-HA] clvm/dlm/gfs2 hangs if a node crashes
William Seligman
seligman at nevis.columbia.edu
Thu Mar 22 12:43:21 MDT 2012
On 3/20/12 4:55 PM, Lars Ellenberg wrote:
> On Fri, Mar 16, 2012 at 05:06:04PM -0400, William Seligman wrote:
>> On 3/16/12 12:12 PM, William Seligman wrote:
>>> On 3/16/12 7:02 AM, Andreas Kurz wrote:
>>>> On 03/15/2012 11:50 PM, William Seligman wrote:
>>>>> On 3/15/12 6:07 PM, William Seligman wrote:
>>>>>> On 3/15/12 6:05 PM, William Seligman wrote:
>>>>>>> On 3/15/12 4:57 PM, emmanuel segura wrote:
>>>>>>>
>>>>>>>> we can try to understand what happen when clvm hang
>>>>>>>>
>>>>>>>> edit the /etc/lvm/lvm.conf and change level = 7 in the log session and
>>>>>>>> uncomment this line
>>>>>>>>
>>>>>>>> file = "/var/log/lvm2.log"
>>>>>>>
>>>>>>> Here's the tail end of the file (the original is 1.6M). Because there no times
>>>>>>> in the log, it's hard for me to point you to the point where I crashed the other
>>>>>>> system. I think (though I'm not sure) that the crash happened after the last
>>>>>>> occurrence of
>>>>>>>
>>>>>>> cache/lvmcache.c:1484 Wiping internal VG cache
>>>>>>>
>>>>>>> Honestly, it looks like a wall of text to me. Does it suggest anything to you?
>>>>>>
>>>>>> Maybe it would help if I included the link to the pastebin where I put the
>>>>>> output: <http://pastebin.com/8pgW3Muw>
>>>>>
>>>>> Could the problem be with lvm+drbd?
>>>>>
>>>>> In lvm2.conf, I see this sequence of lines pre-crash:
>>>>>
>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT
>>>>> device/dev-io.c:271 /dev/md0: size is 1027968 sectors
>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes
>>>>> device/dev-io.c:588 Closed /dev/md0
>>>>> device/dev-io.c:271 /dev/md0: size is 1027968 sectors
>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT
>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes
>>>>> device/dev-io.c:588 Closed /dev/md0
>>>>> filters/filter-composite.c:31 Using /dev/md0
>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT
>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes
>>>>> label/label.c:186 /dev/md0: No label detected
>>>>> device/dev-io.c:588 Closed /dev/md0
>>>>> device/dev-io.c:535 Opened /dev/drbd0 RO O_DIRECT
>>>>> device/dev-io.c:271 /dev/drbd0: size is 5611549368 sectors
>>>>> device/dev-io.c:137 /dev/drbd0: block size is 4096 bytes
>>>>> device/dev-io.c:588 Closed /dev/drbd0
>>>>> device/dev-io.c:271 /dev/drbd0: size is 5611549368 sectors
>>>>> device/dev-io.c:535 Opened /dev/drbd0 RO O_DIRECT
>>>>> device/dev-io.c:137 /dev/drbd0: block size is 4096 bytes
>>>>> device/dev-io.c:588 Closed /dev/drbd0
>>>>>
>>>>> I interpret this: Look at /dev/md0, get some info, close; look at /dev/drbd0,
>>>>> get some info, close.
>>>>>
>>>>> Post-crash, I see:
>>>>>
>>>>> evice/dev-io.c:535 Opened /dev/md0 RO O_DIRECT
>>>>> device/dev-io.c:271 /dev/md0: size is 1027968 sectors
>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes
>>>>> device/dev-io.c:588 Closed /dev/md0
>>>>> device/dev-io.c:271 /dev/md0: size is 1027968 sectors
>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT
>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes
>>>>> device/dev-io.c:588 Closed /dev/md0
>>>>> filters/filter-composite.c:31 Using /dev/md0
>>>>> device/dev-io.c:535 Opened /dev/md0 RO O_DIRECT
>>>>> device/dev-io.c:137 /dev/md0: block size is 1024 bytes
>>>>> label/label.c:186 /dev/md0: No label detected
>>>>> device/dev-io.c:588 Closed /dev/md0
>>>>> device/dev-io.c:535 Opened /dev/drbd0 RO O_DIRECT
>>>>> device/dev-io.c:271 /dev/drbd0: size is 5611549368 sectors
>>>>> device/dev-io.c:137 /dev/drbd0: block size is 4096 bytes
>>>>>
>>>>> ... and then it hangs. Comparing the two, it looks like it can't close /dev/drbd0.
>>>>>
>>>>> If I look at /proc/drbd when I crash one node, I see this:
>>>>>
>>>>> # cat /proc/drbd
>>>>> version: 8.3.12 (api:88/proto:86-96)
>>>>> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by
>>>>> root at hypatia-tb.nevis.columbia.edu, 2012-02-28 18:01:34
>>>>> 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C s-----
>>>>> ns:7000064 nr:0 dw:0 dr:7049728 al:0 bm:516 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
>>>>
>>>> s----- ... DRBD suspended io, most likely because of it's
>>>> fencing-policy. For valid dual-primary setups you have to use
>>>> "resource-and-stonith" policy and a working "fence-peer" handler. In
>>>> this mode I/O is suspended until fencing of peer was succesful. Question
>>>> is, why the peer does _not_ also suspend its I/O because obviously
>>>> fencing was not successful .....
>>>>
>>>> So with a correct DRBD configuration one of your nodes should already
>>>> have been fenced because of connection loss between nodes (on drbd
>>>> replication link).
>>>>
>>>> You can use e.g. that nice fencing script:
>>>>
>>>> http://goo.gl/O4N8f
>>>
>>> This is the output of "drbdadm dump admin": <http://pastebin.com/kTxvHCtx>
>>>
>>> So I've got resource-and-stonith. I gather from an earlier thread that
>>> obliterate-peer.sh is more-or-less equivalent in functionality with
>>> stonith_admin_fence_peer.sh:
>>>
>>> <http://www.gossamer-threads.com/lists/linuxha/users/78504#78504>
>>>
>>> At the moment I'm pursuing the possibility that I'm returning the wrong return
>>> codes from my fencing agent:
>>>
>>> <http://www.gossamer-threads.com/lists/linuxha/users/78572>
>>
>> I cleaned up my fencing agent, making sure its return code matched those
>> returned by other agents in /usr/sbin/fence_, and allowing for some delay issues
>> in reading the UPS status. But...
>>
>>> After that, I'll look at another suggestion with lvm.conf:
>>>
>>> <http://www.gossamer-threads.com/lists/linuxha/users/78796#78796>
>>>
>>> Then I'll try DRBD 8.4.1. Hopefully one of these is the source of the issue.
>>
>> Failure on all three counts.
>
> May I suggest you double check the permissions on your fence peer script?
> I suspect you may simply have forgotten the "chmod +x" .
>
> Test with "drbdadm fence-peer minor-0" from the command line.
I still haven't solved the problem, but this advice has gotten me further than
before.
First, Lars was correct: I did not have execute permissions set on my fence peer
scripts. (D'oh!) I turned them on, but that did not change anything: cman+clvmd
still hung on the vgdisplay command if I crashed the peer node.
I started up both nodes again (cman+pacemaker+drbd+clvmd) and tried Lars'
suggested command. I didn't save the response for this message (d'oh again!) but
it said that the fence-peer script had failed.
Hmm. The peer was definitely shutting down, so my fencing script is working. I
went over it, comparing the return codes to those of the existing scripts, and
made some changes. Here's my current script: <http://pastebin.com/nUnYVcBK>.
Up until now my fence-peer scripts had either been Lon Hohberger's
obliterate-peer.sh or Digimer's rhcs_fence. I decided to try
stonith_admin-fence-peer.sh that Andreas Kurz recommended; unlike the first two
scripts, which fence using fence_node, the latter script just calls stonith_admin.
When I tried the stonith_admin-fence-peer.sh script, it worked:
# drbdadm fence-peer minor-0
stonith_admin-fence-peer.sh[10886]: stonith_admin successfully fenced peer
orestes-corosync.nevis.columbia.edu.
Power was cut on the peer, the remaining node stayed up. Then I brought up the
peer with:
stonith_admin -U orestes-corosync.nevis.columbia.edu
BUT: When the restored peer came up and started to run cman, the clvmd hung on
the main node again.
After cycling through some more tests, I found that if I brought down the peer
with drbdadm, then brought up with the peer with no HA services, then started
drbd and then cman, the cluster remained intact.
If I crashed the peer, the scheme in the previous paragraph didn't work. I bring
up drbd, check that the disks are both UpToDate, then bring up cman. At that
point the vgdisplay on the main node takes so long to run that clvmd will time out:
vgdisplay Error locking on node orestes-corosync.nevis.columbia.edu: Command
timed out
I timed how long it took vgdisplay to run. I might be able to work around this
by setting the timeout on my clvmd resource to 300s, but that seems to be a
band-aid for an underlying problem. Any suggestions on what else I could check?
--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.linux-ha.org/pipermail/linux-ha/attachments/20120322/7d68bc67/attachment.bin
More information about the Linux-HA
mailing list