[Linux-HA] How to handle filesystem corruption

Igor D'Astolfo i.dastolfo at smart.it
Thu Sep 6 10:36:04 MDT 2007



> Hi,
>
> On Thu, Sep 06, 2007 at 12:24:14PM +0200, Igor D'Astolfo wrote:
> >/ Hi,
> />/    I'm using linux-ha to put MySQL in high availability.
> />/ I configured 2 nodes with MySQL in HA, with 3 resources in a group 
> />/ colocated and ordered:
> />/ 
> />/ * the ip bound to the service
> />/ * the partition with data (on a shared storage), formatted with reiserfs
> />/ * the mysql service
> />/ 
> />/ The ha works well, I can migrate the service between the nodes without 
> />/ problems.
> />/ But yesterday I had a big issue: the node that was running the resource 
> />/ group went down for a power loss and left the data partition unclean.
> />/ 
> />/ After the default timeouts, the other node took over the resources and 
> />/ restarted the service. BUT the partition was not clean. This wasn't 
> />/ evident to me, so the server continued to work for about two hours and 
> />/ then the filesystem started to give kernel ops on the fs and mysql 
> />/ stopped responding.
> />/ I had to unmount the partition, make a fsck.reiserfs --rebuild-tree, 
> />/ remount the partition and restore from backup some files that were lost 
> />/ in the correction.
> />/ 
> />/ My question is if it's possible to make a check on the partition before 
> />/ mounting it on the other node or if there's another way to configure the 
> />/ partition to avoid such problems.
> /
> This is arguably a case of software failing in an unexpected way.
> Journaled filesystems should guarantee integrity of data and
> metadata. That's why one uses them. And to avoid very time
> consuming filesystem check procedures on boot. Unfortunately,
> there is usually no quick way to find out if the filesystem is
> good.
>
> Otherwise, it is of course possible to do a filesystem check
> before mounting it. But it will cost time. And it would make the
> startup procedure heavily dependent on the filesystem size and
> its nature. Sometimes, it could even last for hours. The timeouts
> would be really tricky to estimate. At any rate, perhaps this
> could be made an option and then left to the user to decide if
> their filesystem needs extra checking on mount.
>   
I agree with you, the check shouldn't be done automatically, but there 
could be a check on the cause of the switch of the resource.
Eg. if the resource is switching node because I issued a migration it's 
not necessary to check it, but if the switch is caused by a node lock 
down it could be the case to force a check or to make the resource 
stopped until user intervention.

So, at the moment this problem (using a reiserfs filesystem) has no 
solution. Is there a way to avoid this using other filesystems (OCFS??? )?

Regards.
> Dejan
>
> >/ Regards
> />/ 
> />/ _______________________________________________
> />/ Linux-HA mailing list
> />/ Linux-HA at lists.linux-ha.org <http://lists.linux-ha.org/mailman/listinfo/linux-ha>
> />/ http://lists.linux-ha.org/mailman/listinfo/linux-ha
> />/ See also: http://linux-ha.org/ReportingProblems
> /


-- 
*SMART./it/*
*Igor D'Astolfo*
Sistemi e Software

i.dastolfo at smart.it 	Via Roma, 85 - Viadagola
40057 Granarolo Emilia - Bologna
Tel. 051.6056850 - Fax 051.6066196
www.smart.it - info at smart.it

*Smart./it/* realizza servizi via web per l'innovazione d'impresa, allo 
scopo di ottimizzare i processi aziendali, ridurre i costi e migliorare 
la qualità. In particolare Smart.it progetta, sviluppa e gestisce nel 
tempo applicativi software su Internet, sia sul versante funzionale che 
comunicativo. Cura inoltre la realizzazione grafica, editoriale e 
tecnica di siti web, con i relativi servizi di hosting e di web marketing.
/
Il contenuto di questo messaggio è strettamente riservato al 
destinatario suindicato. Qualora aveste ricevuto il messaggio per 
errore, siete pregati di darcene comunicazione ed eliminarlo (allegati 
compresi) senza farne copia. La diffusione o comunicazione e 
riproduzione in qualunque modo eseguite del messaggio ricevuto per 
errore sono vietate.
This e-mail transmission may contain legally privileged and/or 
confidential information. If you have received this e-mail erroneously, 
please notify the sender and delete the original transmission 
attachments without reading or saving it at any rate. Any use, 
distribution, reproduction or disclosure by any other person is strictly 
forbidden./



More information about the Linux-HA mailing list