[Linux-HA] How to handle filesystem corruption

Igor D'Astolfo i.dastolfo at smart.it
Thu Sep 6 04:24:14 MDT 2007


Hi,
    I'm using linux-ha to put MySQL in high availability.
I configured 2 nodes with MySQL in HA, with 3 resources in a group 
colocated and ordered:

* the ip bound to the service
* the partition with data (on a shared storage), formatted with reiserfs
* the mysql service

The ha works well, I can migrate the service between the nodes without 
problems.
But yesterday I had a big issue: the node that was running the resource 
group went down for a power loss and left the data partition unclean.

After the default timeouts, the other node took over the resources and 
restarted the service. BUT the partition was not clean. This wasn't 
evident to me, so the server continued to work for about two hours and 
then the filesystem started to give kernel ops on the fs and mysql 
stopped responding.
I had to unmount the partition, make a fsck.reiserfs --rebuild-tree, 
remount the partition and restore from backup some files that were lost 
in the correction.

My question is if it's possible to make a check on the partition before 
mounting it on the other node or if there's another way to configure the 
partition to avoid such problems.

Regards



More information about the Linux-HA mailing list