FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 11-10-2008, 10:32 AM
"Geoff Galitz"
 
Default Parallel/Shared/Distributed Filesystems

I'm looking at using GFS for parallel access to shared storage, most likely
an iSCSI resource. It will most likely work just fine but I am curious if
folks are using anything with fewer system requisites (e.g. installing and
configuring the Cluster Suite).


Specifically to our case, we have 50 nodes running in-house code (some in
Java, some in C) which (among other things) receives JPGs, processes them
and stores them for later viewing. We are looking to deploy this filesystem
specifically for this JPG storage component.

All nodes are running Centos 5.1 x86_64.


-geoff


Geoff Galitz
Blankenheim NRW, Deutschland
http://www.galitz.org



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-10-2008, 02:31 PM
"nate"
 
Default Parallel/Shared/Distributed Filesystems

Geoff Galitz wrote:
>
>
>
> I'm looking at using GFS for parallel access to shared storage, most likely
> an iSCSI resource. It will most likely work just fine but I am curious if
> folks are using anything with fewer system requisites (e.g. installing and
> configuring the Cluster Suite).

Export the iSCSI resource to a box and re-export it over NFS
would be quite a bit simpler, sounds like your JPG needs are
very basic, GFS sounds overkill.

Note that iSCSI isn't very fast at all, if your array supports
fiber channel I'd highly recommend that connectivity to the
NFS servers over iSCSI any day. If your only choice is iSCSI then
I suggest looking into hardware HBAs, and certainly run jumbo
frames for the iSCSI links, use dedicated network connections for
the iSCSI network. And if you want even higher performance use
dedicated links for the NFS serving as well also with jumbo frames.

If you really want GFS then I would look into running NFS over
GFS with a high availability NFS cluster. Red Hat wrote this
useful doc on how to deploy such a system:

http://sources.redhat.com/cluster/doc/nfscookbook.pdf

My company does something similar, that is we process terrabytes
of data every day for our application, amongst a good 150 servers
or so, today we are using a 520-disk BlueArc-based NAS solution
that the company bought about 4 years ago, looking to replace it
with something else as the NAS portion will be end of lifed soon.

I absolutely would not trust any linux-based NFS or even GFS
over a well tested/supported solution myself for this kind of
requirement(most of the cost of the solution is the back end
disks anyways).

Though if the volume of data is small, and the I/O rates are
small as well you can get by just fine with a linux based
system.

If your using iSCSI your performance bottleneck will likely
be the iSCSI system itself anyways, rather than the linux
box(s).

nate

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-10-2008, 03:07 PM
"Geoff Galitz"
 
Default Parallel/Shared/Distributed Filesystems

Geoff Galitz
Blankenheim NRW, Deutschland
http://www.galitz.org


-----Original Message-----
From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf
Of nate
Sent: Montag, 10. November 2008 16:32
To: centos@centos.org
Subject: Re: [CentOS] Parallel/Shared/Distributed Filesystems

>If you really want GFS then I would look into running NFS over
>GFS with a high availability NFS cluster. Red Hat wrote this
>useful doc on how to deploy such a system:


The main issue is that we feel that our current solution (Linux NFS Clients
-> NetApp) is not sufficient. Our team comes from a Solaris background (my
colleague) and an HPC background (me) and are worried about running into
scalability issues as our infrastructure grows and the internal network
becomes busier and busier. We've already been wrestling with issues such as
broken mountpoints, stale mounts and unrecoverable hangs. Fortunately those
issues have all been resolved for now, but as we continue to grow we may see
them recur. Consider all that as background.

The NetApp is running out of space and we prefer to not replace it with
another one, if possible. To that end we are exploring our options.

I played around with iSCSI, Multipath and NFS and have found that works
quite well so far. Queuing data for delivery when a node become unavailable
using multipath would be sufficient for our needs. Our internal monitoring
systems can take action if a server becomes unavailable and the data can be
queued up long enough for any recovery actions to complete (apart from the
next big earthquake). We do not necessarily require a more traditionak
redundant storage system (such as an NFS cluster with dedicated NFS server
nodes)... but we are not ruling that out, either.


Just all food for thought.

-geoff




_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-10-2008, 03:59 PM
"nate"
 
Default Parallel/Shared/Distributed Filesystems

Geoff Galitz wrote:

> The NetApp is running out of space and we prefer to not replace it with
> another one, if possible. To that end we are exploring our options.

NetApp, while it has a big name has the worst level of space
efficiency in the industry and it's performance isn't so hot
either. It does have some nice features though it depends on
your needs.

The solutions we are looking at here is a 3PAR T400-based back
end with a Exanet EX1500 cluster(2-node) front end, and a HDS
AMS2300-based back end with a BlueArc Titan 3100 cluster(2-node)
front end. Though I'm not at all satisfied with the scalability
of the AMS2300 the vendor is grasping at straws trying to justify
it's existence, the higher end AMS2500 would be more suitable
(still not scalable), though the vendor refuses to quote it
because it's not due till late this year/early next.

Both NAS front ends scale to 8 nodes(with Exanet claiming unlimited
nodes though 8 is their currently supported maximum). 8 nodes
is enough performance to drive 1,000 SATA disks or more. The
3PAR T400 back end has linear scalability to 1,152 SATA disks
(1.2PB), the AMS2300 goes up to 240 disks(248TB).

Both NAS front end clusters can each address a minimum of
500TB of storage(per pair) and support millions of files
per directory without a performance hit.

I talked with NetApp on a couple of occasions and finally nailed
down that their competitive solution would be their GX product
line but I don't think they can get the price to where the
competition is as they promised pricing 3 weeks ago and haven't
heard a peep since.

The idea is to be able to start small(in our case ~100TB usable),
and be able to grow much larger as the company needs within
a system that can automatically re-balance the I/O as the system
expands for maximum performance while keeping a price tag that
is within our budget. Our current 520-disk system is horribly
unbalanced and it's not possible to re-balance it without massive
downtime, the result is probably at least a 50% loss in performance
overall. Of course there are lots of other goals but that's
the biggie.

The 3PAR/Exanet solution can scale within a single system to
approx 630,000 SpecSFS IOPS on a single file system, the HDS/BlueArc
solution can scale to about 150,000 SpecSFS IOPS on a couple of
file systems. The 3PAR would have 4 controllers, Exanet would
have 8 controllers, HDS would have 2 controllers, BlueArc would
have 2 controllers at their peak. In both cases the performance
"limit" is the back end storage, not the front end units.

Of course nothing stops the NAS units from being able to address
storage beyond a single array but you lose the ability to effectively
balance the I/O across multiple storage systems in that event which
leads to the problem we have with our current system. Perhaps if
your willing to spend a couple million a HDS USP-based system
might be effective balancing across multiple systems with their
virtualized thingamabob. Our budget is a fraction of that though.

NetApp's (non-GX) limitations prevent it from competing in this
area effectively.(They do have some ability to re-balance but it
pales in comparison).

nate

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-10-2008, 04:51 PM
"Marcelo M. Garcia"
 
Default Parallel/Shared/Distributed Filesystems

Geoff Galitz wrote:



I'm looking at using GFS for parallel access to shared storage, most likely
an iSCSI resource. It will most likely work just fine but I am curious if
folks are using anything with fewer system requisites (e.g. installing and
configuring the Cluster Suite).


Specifically to our case, we have 50 nodes running in-house code (some in
Java, some in C) which (among other things) receives JPGs, processes them
and stores them for later viewing. We are looking to deploy this filesystem
specifically for this JPG storage component.

All nodes are running Centos 5.1 x86_64.


-geoff


Hi

Maybe you can consider pNFS, parallel NFS:
http://www.pnfs.com/

Regards

Marcelo

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 06:10 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org