FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 06-22-2008, 01:27 AM
Erek Dyskant
 
Default recommendations for copying large filesystems

On Sat, 2008-06-21 at 09:33 -0400, Mag Gam wrote:
> I need to copy over 100TB of data from one server to another via
> network. What is the best option to do this? I am planning to use
> rsync but is there a better tool or better way of doing this?

At gigabit speeds, you're looking at over a week of transfer time: 1
gigabit = 125MB/sec = 800,000 seconds = 9.25 days, not counting protocol
overhead. You could speed this up with link bonding, which from
previous threads sounds like something you're working on already.

If it's a oneoff transfer and you can afford downtime while you're
fiddling with hardware, you may consider directly attaching both sets of
storage to the same machine and doing a local copy.

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2008, 05:28 PM
"Raja Subramanian"
 
Default recommendations for copying large filesystems

On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner <rainer@ultra-secure.de> wrote:
> Now that I know the details - I don' think this is going to work. Not with
> 100 TB of data. It kind-of-works with 1 TB.
> Can anybody comment on the feasibility of rsync on 1 million files?

rsync always broke on my filesystems with 200-300k files
due to out of memory errors (my box had 2GB RAM).

- Raja
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2008, 07:34 PM
nightduke
 
Default recommendations for copying large filesystems

stops sync?

2008/6/22 Raja Subramanian <rajasuperman@gmail.com>:
> On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner <rainer@ultra-secure.de> wrote:
>> Now that I know the details - I don' think this is going to work. Not with
>> 100 TB of data. It kind-of-works with 1 TB.
>> Can anybody comment on the feasibility of rsync on 1 million files?
>
> rsync always broke on my filesystems with 200-300k files
> due to out of memory errors (my box had 2GB RAM).
>
> - Raja
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-22-2008, 08:32 PM
Dag Wieers
 
Default recommendations for copying large filesystems

On Sun, 22 Jun 2008, Raja Subramanian wrote:


On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner <rainer@ultra-secure.de> wrote:

Now that I know the details - I don' think this is going to work. Not with
100 TB of data. It kind-of-works with 1 TB.
Can anybody comment on the feasibility of rsync on 1 million files?


rsync always broke on my filesystems with 200-300k files
due to out of memory errors (my box had 2GB RAM).


I have done 700k and 800k files transfers (including hardlinks), but
indeed it could take a while to compute the transferlist. Newer rsync
versions bring down the amount of memory needed drastically. That is one
of the reasons I offer a recent rsync in RPMforge. There is almost never a
good reason to use a dated rsync.


--
-- dag wieers, dag@centos.org, http://dag.wieers.com/ --
[Any errors in spelling, tact or fact are transmission errors]
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-23-2008, 03:12 AM
"Michael Semcheski"
 
Default recommendations for copying large filesystems

On Sun, Jun 22, 2008 at 4:32 PM, Dag Wieers <dag@centos.org> wrote:


I have done 700k and 800k files transfers (including hardlinks), but indeed it could take a while to compute the transferlist. Newer rsync versions bring down the amount of memory needed drastically. That is one of the reasons I offer a recent rsync in RPMforge. There is almost never a good reason to use a dated rsync.



I just thought I'd de-lurk and chime in that there are some patches for ssh to allow better performance:

http://www.psc.edu/networking/projects/hpn-ssh/


If you do end up using rsync for something like this via ssh, you might want to look at some of the Pittsburgh Supercomputing Center's patches.* The high-performance patches can allow you to see dramatic increases in throughput.

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-23-2008, 03:24 AM
Erek Dyskant
 
Default recommendations for copying large filesystems

> If you do end up using rsync for something like this via ssh, you
> might want to look at some of the Pittsburgh Supercomputing Center's
> patches. The high-performance patches can allow you to see dramatic
> increases in throughput.

Or, if it's over a secure network, drop ssh entirely and use the rsync
protocol.

--Erek

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-23-2008, 05:46 AM
Mogens Kjaer
 
Default recommendations for copying large filesystems

Rainer Duffner wrote:
...

Can anybody comment on the feasibility of rsync on 1 million files?


I rsync 2.6M files daily. No problem.

It takes 15 minutes, if there's only a few changes.

For fast transfer of files between two machines
I usually use ttcp:

From machine:

tar cf - .|ttcp -l5120 -t to_machine

To machine

cd /whatever
ttcp -l5120 -r | tar xf -

I get ~100Mbytes/sec on a gigabit connection.

Note this is unsecure, no way of restarting, etc.

Mogens

--
Mogens Kjaer, Carlsberg A/S, Computer Department
Gamle Carlsberg Vej 10, DK-2500 Valby, Denmark
Phone: +45 33 27 53 25, Fax: +45 33 27 47 08
Email: mk@crc.dk Homepage: http://www.crc.dk
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-23-2008, 06:13 PM
Tom Brown
 
Default recommendations for copying large filesystems

I have done 700k and 800k files transfers (including hardlinks), but
indeed it could take a while to compute the transferlist. Newer rsync
versions bring down the amount of memory needed drastically. That is
one of the reasons I offer a recent rsync in RPMforge. There is almost
never a good reason to use a dated rsync.


i have used rsync on ~16million files with a filesystem size of about
1.5TB - worked fine but took a while


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 06-24-2008, 05:17 PM
Jerry Franz
 
Default recommendations for copying large filesystems

Mag Gam wrote:
I need to copy over 100TB of data from one server to another via
network. What is the best option to do this? I am planning to use
rsync but is there a better tool or better way of doing this?


For example, I plan on doing
rsync -azv /largefs /targetfs

/targetfs is a NFS mounted filesystem.

Any thoughts
You are going to pay a large performance penalty for the simplicity of
using a local form rsync. Between the substantial overheads of rsync
itself and NFS you are not going to come anywhere near your maximum
possible speed and you will probably need a lot of memory if you have a
lot of files (rsync uses a lot of memory to track all the files). When
I'm serious about moving large amounts of data at the highest speed I
use tar tunneled through ssh. The rough invokation to pull from a remote
machine looks like this:


ssh -2 -c arcfour -T -x sourcemachine.com 'tar --directory=/data -Scpf -
.' | tar --directory=/local-data-dir -Spxf -"


That should pull the contents of the sourcemachine's /data directory to
an already existing local /local-data-dir. On reasonably fast machines
(better than 3 Ghz CPUs) it tends to approach the limit of either your
hard drives' speed or your network capacity.


If you don't like the ssh tunnel, you can strip it down to just the two
tars (one to throw and one to catch) and copy it over NFS. It will still
be faster than what you are proposing. Or you can use cpio.


Rsync is best at synchonizing two already nearly identical trees. Not so
good as a bulk copier.


--
Benjamin Franz

--
Benjamin Franz








TIA


------------------------------------------------------------------------

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 02:34 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org