Redundant Data storage solutions (Data clusters?)

Thu Oct 14 05:24:52 UTC 2010

On Thursday, October 14, 2010 03:24 AM, Lamp Zy wrote:
> Hi,
>
> What I'm looking for is suggestions, best practices, success stories or
> anything that will point me in the right direction.
>
> I realize that our requirements are the same as 90% of the companies out
> there but the "Redundancy" part is where I stumbled upon.
>
> We are hosting user images that are less then 1MB in size. Images are
> small and the processing we are doing on them is not I/O intensive so
> there is no need for high performance hardware or high network speeds.
> File systems are exported to the application servers over NFS.
>
> All we need is a reliable storage solution that works over NFS (NAS).
>
> It is expected that our storage needs will grow to about 50+ TB within 5
> years.
>
> One of the requirements is redundancy. If one storage unit fails then
> another should pick up with no interruption. Also it needs to be
> scalable. At this point we can not invest in all 50TB storage so we need
> to be able to add more storage easily as needed.
>
> We are about to go with gluster (gluster.org). It is a Data Cluster
> solution where one can add systems with DAS in pairs. Each pair can be
> configured to simulate RAID1 (if I understand correctly). Looks good...
> on paper at least. My concerns are that they use their own glusterFS
> which probably relies on ext3 or zfs. Also you can use NFS but it's
> recommended to use their client daemon instead and so on.

glusterfs is another layer on top of whatever underlying filesystem is 
actually used.

You can use GFS or Lustre to glue NFS servers to storage servers or use 
glusterfs which is the only solution that provides something similar and 
that will also allow you to directly get the file through a host filesystem.

Or you can do your own application level redundancy. After all, you are 
just using NFS. All you have to do is maintain an extra copy of the file 
on another NFS server. You just simply do say, all files on A shall have 
another copy on B or schedule regular rsyncs and get the code to lookup 
B if A is not available.