On Sat, 2008-06-21 at 09:33 -0400, Mag Gam wrote:
> I need to copy over 100TB of data from one server to another via
> network. What is the best option to do this? I am planning to use
> rsync but is there a better tool or better way of doing this?
At gigabit speeds, you're looking at over a week of transfer time: 1
gigabit = 125MB/sec = 800,000 seconds = 9.25 days, not counting protocol
overhead. You could speed this up with link bonding, which from
previous threads sounds like something you're working on already.
If it's a oneoff transfer and you can afford downtime while you're
fiddling with hardware, you may consider directly attaching both sets of
storage to the same machine and doing a local copy.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
06-22-2008, 05:28 PM
"Raja Subramanian"
recommendations for copying large filesystems
On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner <rainer@ultra-secure.de> wrote:
> Now that I know the details - I don' think this is going to work. Not with
> 100 TB of data. It kind-of-works with 1 TB.
> Can anybody comment on the feasibility of rsync on 1 million files?
rsync always broke on my filesystems with 200-300k files
due to out of memory errors (my box had 2GB RAM).
- Raja
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
06-22-2008, 07:34 PM
nightduke
recommendations for copying large filesystems
stops sync?
2008/6/22 Raja Subramanian <rajasuperman@gmail.com>:
> On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner <rainer@ultra-secure.de> wrote:
>> Now that I know the details - I don' think this is going to work. Not with
>> 100 TB of data. It kind-of-works with 1 TB.
>> Can anybody comment on the feasibility of rsync on 1 million files?
>
> rsync always broke on my filesystems with 200-300k files
> due to out of memory errors (my box had 2GB RAM).
>
> - Raja
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
06-22-2008, 08:32 PM
Dag Wieers
recommendations for copying large filesystems
On Sun, 22 Jun 2008, Raja Subramanian wrote:
On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner <rainer@ultra-secure.de> wrote:
Now that I know the details - I don' think this is going to work. Not with
100 TB of data. It kind-of-works with 1 TB.
Can anybody comment on the feasibility of rsync on 1 million files?
rsync always broke on my filesystems with 200-300k files
due to out of memory errors (my box had 2GB RAM).
I have done 700k and 800k files transfers (including hardlinks), but
indeed it could take a while to compute the transferlist. Newer rsync
versions bring down the amount of memory needed drastically. That is one
of the reasons I offer a recent rsync in RPMforge. There is almost never a
good reason to use a dated rsync.
--
-- dag wieers, dag@centos.org, http://dag.wieers.com/ --
[Any errors in spelling, tact or fact are transmission errors]
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
06-23-2008, 03:12 AM
"Michael Semcheski"
recommendations for copying large filesystems
On Sun, Jun 22, 2008 at 4:32 PM, Dag Wieers <dag@centos.org> wrote:
I have done 700k and 800k files transfers (including hardlinks), but indeed it could take a while to compute the transferlist. Newer rsync versions bring down the amount of memory needed drastically. That is one of the reasons I offer a recent rsync in RPMforge. There is almost never a good reason to use a dated rsync.
I just thought I'd de-lurk and chime in that there are some patches for ssh to allow better performance:
http://www.psc.edu/networking/projects/hpn-ssh/
If you do end up using rsync for something like this via ssh, you might want to look at some of the Pittsburgh Supercomputing Center's patches.* The high-performance patches can allow you to see dramatic increases in throughput.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
06-23-2008, 03:24 AM
Erek Dyskant
recommendations for copying large filesystems
> If you do end up using rsync for something like this via ssh, you
> might want to look at some of the Pittsburgh Supercomputing Center's
> patches. The high-performance patches can allow you to see dramatic
> increases in throughput.
Or, if it's over a secure network, drop ssh entirely and use the rsync
protocol.
--Erek
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
06-23-2008, 05:46 AM
Mogens Kjaer
recommendations for copying large filesystems
Rainer Duffner wrote:
...
Can anybody comment on the feasibility of rsync on 1 million files?
I rsync 2.6M files daily. No problem.
It takes 15 minutes, if there's only a few changes.
For fast transfer of files between two machines
I usually use ttcp:
I have done 700k and 800k files transfers (including hardlinks), but
indeed it could take a while to compute the transferlist. Newer rsync
versions bring down the amount of memory needed drastically. That is
one of the reasons I offer a recent rsync in RPMforge. There is almost
never a good reason to use a dated rsync.
i have used rsync on ~16million files with a filesystem size of about
1.5TB - worked fine but took a while
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
06-24-2008, 05:17 PM
Jerry Franz
recommendations for copying large filesystems
Mag Gam wrote:
I need to copy over 100TB of data from one server to another via
network. What is the best option to do this? I am planning to use
rsync but is there a better tool or better way of doing this?
For example, I plan on doing
rsync -azv /largefs /targetfs
/targetfs is a NFS mounted filesystem.
Any thoughts
You are going to pay a large performance penalty for the simplicity of
using a local form rsync. Between the substantial overheads of rsync
itself and NFS you are not going to come anywhere near your maximum
possible speed and you will probably need a lot of memory if you have a
lot of files (rsync uses a lot of memory to track all the files). When
I'm serious about moving large amounts of data at the highest speed I
use tar tunneled through ssh. The rough invokation to pull from a remote
machine looks like this:
That should pull the contents of the sourcemachine's /data directory to
an already existing local /local-data-dir. On reasonably fast machines
(better than 3 Ghz CPUs) it tends to approach the limit of either your
hard drives' speed or your network capacity.
If you don't like the ssh tunnel, you can strip it down to just the two
tars (one to throw and one to catch) and copy it over NFS. It will still
be faster than what you are proposing. Or you can use cpio.
Rsync is best at synchonizing two already nearly identical trees. Not so
good as a bulk copier.