[Linux-HA] Question about open source replication solutions
Achim Stumpf
newgrp at gmx.de
Thu Feb 1 02:34:38 MST 2007
Yan Fitterer wrote:
> A little clarification:
>
> I wasn't proposing to have a separate iSCSI device. The idea was for all threee nodes to present their local storage as
> an iSCSI target. Then all three nodes can create a single volume (using MD, with or without EVMS) out of the three iSCSI
> targets, one of which happends to be on a local IP address. So each node would run both iSCSI target and initiator
> software.
>
> Even a complete failure of the iSCSI IP network would allow each node to carry on (but see below for major caveat)...
>
That means that each node offers a iscsi device. And each node mounts
the two iscsi devices of the other two nodes. If the iscsi storage
network fails each node would still have their own one to keep on
running. That's a nice idea.
> Finally, you need a cluster parallel filesystem to ensure that all nodes can write to the same logical view of the data
> simultaneously. If your application can deal with having the data available read-only, you may be able to get away with
> a non-clustered filesystem, by mounting all nodes with a read-only flag. Not all filesystems are able to do this without
> filesystem corruption though.
>
> The assumption here of course is that the application is not some kind of database, as if it is, you may have major
> problems consolidating the data changes after an iSCSI network failure where more than one node applied changes to the
> data.
>
> You can mitigate this by having a redudant dedicated IP-based storage network, using dedicated interfaces and the
> bonding driver.
>
> If you tell us more about what you're actually trying to achieve (application, constraints, hardware available,
> requirements...) we may be able to give you more focused answers...
>
> Yan
>
The servers would be build up in three different locations, which means
one in Frankfurt, one in Cologne, one in Munich. The three locations are
connected with each other through 1Gbit connection, which is already
present. In those three locations is always one server, which runs a
webserver and a ftp server. The write access of the webserver and the
ftpserver is not very much. Each of these servers needs read/write
access to the same data. The read access is quite more than the write
access, but it's not heavy loaded. We could at least grant that solution
10Mbit of traffic, mostly even more between the locations. The amount of
data is just a few GB. We have started to test that with coda and it
works somehow, but i was wondering, if there are not some other kinds of
solutions/ideas.
In case of your iscsi solution, i am thinking, that if for example, the
connection between the three locations fails, then at the end i have to
choose one server which is the fundament then to recover the data of all
the others, which means that the changes of the other nodes get lost
most probably. Coda offers for that some conflict resolution tools, but
by now i haven't tested those, and i don't feel well with coda by now.
The other thing with coda is that the coda server and the coda client
can't run on the same host, so one needs extra storage server nodes in
each location.
I am using here heartbeat and drbd and i am a big fan of that solution,
but with drbd 0.7 you can replicate only two nodes and only one has
read/write access and the other not. With drbd 0.8 both nodes have
read/write access, but it's only two. So far as i remember drbd has put
that commercial feature drbd+ into drbd 0.8 now, but then you make only
a asynchronous replication to a third node, and i don't think that this
node has any read/write access. That one would only be for disaster
recovery, in case the other two nodes would be destroyed through a
nuclear bomb. :o(
So you think that iscsi, GFS, MD could handle it, if the other two
remote iscsi devices fail. I don't have experience with iscsi/GFS by
now. Would it be easy to recover all nodes to the same state of data? If
a file gets changed on a iscsi/gfs device, is the whole changed file
transferred over the network to the other nodes or only the changed
parts/blocks? DRBD transfers only the changed blocks.
Thanks,
Achim
>
>>>> On Wed, Jan 31, 2007 at 3:32 PM, in message <45C0B6A8.8090103 at gmx.de>, Achim
>>>>
> Stumpf <newgrp at gmx.de> wrote:
>
>> Yan Fitterer wrote:
>>
>>> You can stretch that to three nodes by using iSCSI and MD.
>>>
>>>
>> So you mean,
>>
>> Exporting an iscsi device to the fileservers in the three different
>> locations. Those fileservers take that iscsi device and create a raid1
>> mirror with a local disk. On top of that we put a GFS, so that all three
>> nodes are able to write concurrently to that.
>>
>> But what happens if the iscsi device fails? All three nodes write on
>> their local copy. GFS can handle that? And what happens when i get the
>> server with the iscsi device back online? Most probably I have to take
>> one copy of one of the local md devices for the recovery...
>>
>> Does anyone have some experience with such a setup?
>>
>> Achim
>>
>>
>>
>>> Using iSCSI target software (standard on SLES10, or SUSE 10.1 and above - not
>>>
>> sure of other distros), you can make the
>>
>>> local device available to all others.
>>>
>>> Then, using the MD (Raid) driver, you can create a RAID1 mirrored set, that
>>>
>> will store the data on all devices. You can
>>
>>> use EVMS, or straight MD devices.
>>>
>>> To make this into a filesystem available to all nodes, you can then use
>>>
>> OpenGFS, or ocfs2 (if you're running SLES 10),
>>
>>> or some other cluster parallel filesystem.
>>>
>>> That way, the data is actually copied to all three nodes, and you can lose
>>>
>> any two before the data is affected. Using
>>
>>> the MD driver, you can even add other "hot spare" so that should one or two
>>>
>> drives be lost, the specified number of
>>
>>> copies are regained as soon as the data can be re- copied.
>>>
>>> Not tried this, but no reason I can see it wouldn't work.
>>>
>>> Yan
>>>
>>>
>>>
>>>>>> On Wed, Jan 31, 2007 at 11:38 AM, in message
>>>>>>
>>>>>>
>>> <de07811a0701310338j2c0a5185tbaede4f42facfd25 at mail.gmail.com>, "George H"
>>> <george.dma at gmail.com> wrote:
>>>
>>>
>>>> You could have a separage 2 node cluster that uses DRBD, except you
>>>> put OpenGFS on the DRBD partition. OpenGFS is a shared filesystem.
>>>> Then you make all your nodes use that shared FS. I've never done this
>>>> before but it should be possible, either that or use a SAN with a
>>>> shared- filesystem.
>>>>
>>>> On 1/31/07, Achim Stumpf <newgrp at gmx.de> wrote:
>>>>
>>>>
>>>>> Hi list,
>>>>>
>>>>> I need a solution where it is possible to have three fileservers in
>>>>> three different locations. All of them have the same data. This data
>>>>> should be replicated over all of those three fileserver nodes. It should
>>>>> work similar to drbd, but with more than two nodes. It must not work on
>>>>> blocklevel. Rsync is not the solution, it should work more like a
>>>>> filesystem or blockdevice.
>>>>> I have made some experince with coda, but i am wondering if there are
>>>>> other solutions which provide such a functionality.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Achim
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Linux- HA mailing list
>>>>> Linux- HA at lists.linux- ha.org
>>>>> http://lists.linux- ha.org/mailman/listinfo/linux- ha
>>>>> See also: http://linux- ha.org/ReportingProblems
>>>>>
>>>>>
>>>>>
>>> _______________________________________________
>>> Linux- HA mailing list
>>> Linux- HA at lists.linux- ha.org
>>> http://lists.linux- ha.org/mailman/listinfo/linux- ha
>>> See also: http://linux- ha.org/ReportingProblems
>>>
>>>
>>>
>> _______________________________________________
>> Linux- HA mailing list
>> Linux- HA at lists.linux- ha.org
>> http://lists.linux- ha.org/mailman/listinfo/linux- ha
>> See also: http://linux- ha.org/ReportingProblems
>>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
More information about the Linux-HA
mailing list