[Linux-HA] How to handle filesystem corruption

Eddie C edlinuxguru at gmail.com
Thu Sep 6 11:25:24 MDT 2007


I would think all disk systems would suffer this type of problem.
OCFS2 has this problem check_ocfs2 as well as more clustered disk
problems. Switching to OCFS2 is not going to make your life easier.

On 9/6/07, Igor D'Astolfo <i.dastolfo at smart.it> wrote:
>
>
> > Hi,
> >
> > On Thu, Sep 06, 2007 at 12:24:14PM +0200, Igor D'Astolfo wrote:
> > >/ Hi,
> > />/    I'm using linux-ha to put MySQL in high availability.
> > />/ I configured 2 nodes with MySQL in HA, with 3 resources in a group
> > />/ colocated and ordered:
> > />/
> > />/ * the ip bound to the service
> > />/ * the partition with data (on a shared storage), formatted with reiserfs
> > />/ * the mysql service
> > />/
> > />/ The ha works well, I can migrate the service between the nodes without
> > />/ problems.
> > />/ But yesterday I had a big issue: the node that was running the resource
> > />/ group went down for a power loss and left the data partition unclean.
> > />/
> > />/ After the default timeouts, the other node took over the resources and
> > />/ restarted the service. BUT the partition was not clean. This wasn't
> > />/ evident to me, so the server continued to work for about two hours and
> > />/ then the filesystem started to give kernel ops on the fs and mysql
> > />/ stopped responding.
> > />/ I had to unmount the partition, make a fsck.reiserfs --rebuild-tree,
> > />/ remount the partition and restore from backup some files that were lost
> > />/ in the correction.
> > />/
> > />/ My question is if it's possible to make a check on the partition before
> > />/ mounting it on the other node or if there's another way to configure the
> > />/ partition to avoid such problems.
> > /
> > This is arguably a case of software failing in an unexpected way.
> > Journaled filesystems should guarantee integrity of data and
> > metadata. That's why one uses them. And to avoid very time
> > consuming filesystem check procedures on boot. Unfortunately,
> > there is usually no quick way to find out if the filesystem is
> > good.
> >
> > Otherwise, it is of course possible to do a filesystem check
> > before mounting it. But it will cost time. And it would make the
> > startup procedure heavily dependent on the filesystem size and
> > its nature. Sometimes, it could even last for hours. The timeouts
> > would be really tricky to estimate. At any rate, perhaps this
> > could be made an option and then left to the user to decide if
> > their filesystem needs extra checking on mount.
> >
> I agree with you, the check shouldn't be done automatically, but there
> could be a check on the cause of the switch of the resource.
> Eg. if the resource is switching node because I issued a migration it's
> not necessary to check it, but if the switch is caused by a node lock
> down it could be the case to force a check or to make the resource
> stopped until user intervention.
>
> So, at the moment this problem (using a reiserfs filesystem) has no
> solution. Is there a way to avoid this using other filesystems (OCFS??? )?
>
> Regards.
> > Dejan
> >
> > >/ Regards
> > />/
> > />/ _______________________________________________
> > />/ Linux-HA mailing list
> > />/ Linux-HA at lists.linux-ha.org <http://lists.linux-ha.org/mailman/listinfo/linux-ha>
> > />/ http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > />/ See also: http://linux-ha.org/ReportingProblems
> > /
>
>
> --
> *SMART./it/*
> *Igor D'Astolfo*
> Sistemi e Software
>
> i.dastolfo at smart.it     Via Roma, 85 - Viadagola
> 40057 Granarolo Emilia - Bologna
> Tel. 051.6056850 - Fax 051.6066196
> www.smart.it - info at smart.it
>
> *Smart./it/* realizza servizi via web per l'innovazione d'impresa, allo
> scopo di ottimizzare i processi aziendali, ridurre i costi e migliorare
> la qualità. In particolare Smart.it progetta, sviluppa e gestisce nel
> tempo applicativi software su Internet, sia sul versante funzionale che
> comunicativo. Cura inoltre la realizzazione grafica, editoriale e
> tecnica di siti web, con i relativi servizi di hosting e di web marketing.
> /
> Il contenuto di questo messaggio è strettamente riservato al
> destinatario suindicato. Qualora aveste ricevuto il messaggio per
> errore, siete pregati di darcene comunicazione ed eliminarlo (allegati
> compresi) senza farne copia. La diffusione o comunicazione e
> riproduzione in qualunque modo eseguite del messaggio ricevuto per
> errore sono vietate.
> This e-mail transmission may contain legally privileged and/or
> confidential information. If you have received this e-mail erroneously,
> please notify the sender and delete the original transmission
> attachments without reading or saving it at any rate. Any use,
> distribution, reproduction or disclosure by any other person is strictly
> forbidden./
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



More information about the Linux-HA mailing list