Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Gentoo User (http://www.linux-archive.org/gentoo-user/)
-   -   RFC : fast copying of a whole directory tree (http://www.linux-archive.org/gentoo-user/632263-rfc-fast-copying-whole-directory-tree.html)

Helmut Jarausch 02-13-2012 09:49 AM

RFC : fast copying of a whole directory tree
 
Hi,

when copying a whole directory tree with standard tools, e.g.
tar cf - . | ( cd $DEST && tar xf - )
or cpio -p ...

the source disk is busy seeking. That's noisy and particularly slow.

I've written a small Python program which outputs the file names in
i-node order. If this is fed into tar or cpio nearly no seeks are
required during copying.

I've tested it by comparing the resulting copied tree to one created by
tar | tar.

But it's correctness for backing up data is critical.
Therefore I'd like to ask for comments.

Thanks for any comments,
Helmut.

Michael Orlitzky 02-13-2012 02:17 PM

RFC : fast copying of a whole directory tree
 
On 02/13/12 05:49, Helmut Jarausch wrote:
>
> I've written a small Python program which outputs the file names in
> i-node order. If this is fed into tar or cpio nearly no seeks are
> required during copying.

What makes you think the inodes are sequential on-disk?


> But it's correctness for backing up data is critical.
> Therefore I'd like to ask for comments.

You're nuts =)

Seriously though, use cp, tar, or rsync. They've seen years of use by
millions of people. All of the remaining bugs are sufficiently insidious
that you'll never hit them. The same probably isn't true for your script!

Grant Edwards 02-13-2012 02:31 PM

RFC : fast copying of a whole directory tree
 
On 2012-02-13, Michael Orlitzky <michael@orlitzky.com> wrote:
> On 02/13/12 05:49, Helmut Jarausch wrote:
>>
>> I've written a small Python program which outputs the file names in
>> i-node order. If this is fed into tar or cpio nearly no seeks are
>> required during copying.
>
> What makes you think the inodes are sequential on-disk?

Even if the i-nodes are sequential on-disk, there's no reason to think
that the data blocks associated with the inodes are in any particular
order with respect to the i-nodes themselves.

>> But it's correctness for backing up data is critical.
>> Therefore I'd like to ask for comments.
>
> You're nuts =)
>
> Seriously though, use cp, tar, or rsync. They've seen years of use by
> millions of people. All of the remaining bugs are sufficiently
> insidious that you'll never hit them. The same probably isn't true
> for your script!

--
Grant Edwards grant.b.edwards Yow! All this time I've
at been VIEWING a RUSSIAN
gmail.com MIDGET SODOMIZE a HOUSECAT!

02-13-2012 03:11 PM

RFC : fast copying of a whole directory tree
 
Grant Edwards <grant.b.edwards@gmail.com> wrote:

> On 2012-02-13, Michael Orlitzky <michael@orlitzky.com> wrote:
> > On 02/13/12 05:49, Helmut Jarausch wrote:
> >>
> >> I've written a small Python program which outputs the file names in
> >> i-node order. If this is fed into tar or cpio nearly no seeks are
> >> required during copying.
> >
> > What makes you think the inodes are sequential on-disk?
>
> Even if the i-nodes are sequential on-disk, there's no reason to think
> that the data blocks associated with the inodes are in any particular
> order with respect to the i-nodes themselves.

Correct, there is however a really fast method using "star -copy".

This works because there are two decoupled processes, shared memory between
them and the fact that star reads names from directories in one big chunk.

Jörg

--
EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
js@cs.tu-berlin.de (uni)
joerg.schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Pandu Poluan 02-13-2012 03:29 PM

RFC : fast copying of a whole directory tree
 
On Feb 13, 2012 11:15 PM, "Joerg Schilling" <Joerg.Schilling@fokus.fraunhofer.de> wrote:

>

> Grant Edwards <grant.b.edwards@gmail.com> wrote:

>

> > On 2012-02-13, Michael Orlitzky <michael@orlitzky.com> wrote:

> > > On 02/13/12 05:49, Helmut Jarausch wrote:

> > >>

> > >> I've written a small Python program which outputs the file names in

> > >> i-node order. If this is fed into tar or cpio nearly no seeks are

> > >> required during copying.

> > >

> > > What makes you think the inodes are sequential on-disk?

> >

> > Even if the i-nodes are sequential on-disk, there's no reason to think

> > that the data blocks associated with the inodes are in any particular

> > order with respect to the i-nodes themselves.

>

> Correct, there is however a really fast method using "star -copy".

>

> This works because there are two decoupled processes, shared memory between

> them and the fact that star reads names from directories in one big chunk.

>


Honestly, that's news to me. Which package has star?


Rgds,

Nikos Chantziaras 02-13-2012 03:37 PM

RFC : fast copying of a whole directory tree
 
On 13/02/12 18:29, Pandu Poluan wrote:


On Feb 13, 2012 11:15 PM, "Joerg Schilling"
<Joerg.Schilling@fokus.fraunhofer.de
<mailto:Joerg.Schilling@fokus.fraunhofer.de>> wrote:
> Correct, there is however a really fast method using "star -copy".
>
> This works because there are two decoupled processes, shared memory
between
> them and the fact that star reads names from directories in one big
chunk.
>

Honestly, that's news to me. Which package has star?


eix -e star

:-/

Pandu Poluan 02-13-2012 04:42 PM

RFC : fast copying of a whole directory tree
 
On Feb 13, 2012 11:41 PM, "Nikos Chantziaras" <realnc@arcor.de> wrote:

>

> On 13/02/12 18:29, Pandu Poluan wrote:

>>

>>

>> On Feb 13, 2012 11:15 PM, "Joerg Schilling"

>> <Joerg.Schilling@fokus.fraunhofer.de

>> <mailto:Joerg.Schilling@fokus.fraunhofer.de>> wrote:

>> *> Correct, there is however a really fast method using "star -copy".

>> *>

>> *> This works because there are two decoupled processes, shared memory

>> between

>> *> them and the fact that star reads names from directories in one big

>> chunk.

>> *>

>>

>> Honestly, that's news to me. Which package has star?

>

>

> eix -e star

>

> :-/

>


Hehhe... sorry, I'm on the road and don't have Gentoo on my smartphone :-P


Rgds,

02-13-2012 05:20 PM

RFC : fast copying of a whole directory tree
 
Nikos Chantziaras <realnc@arcor.de> wrote:

> > > This works because there are two decoupled processes, shared memory
> > between
> > > them and the fact that star reads names from directories in one big
> > chunk.
> > >
> >
> > Honestly, that's news to me. Which package has star?
>
> eix -e star

To help star to buffer, give star a large fifo size that is up to haslf of the
RAM in your machine, e.g. fs=1000m

To make sure that star gives fast file creation (unpacking of archives) on
filesystems that do not support fast verified transactions, you need to make
star as "insecure" as other software to get comparable results, so add:

-no-fsync

Jörg

--
EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
js@cs.tu-berlin.de (uni)
joerg.schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Dale 02-13-2012 09:11 PM

RFC : fast copying of a whole directory tree
 
Joerg Schilling wrote:
> Nikos Chantziaras <realnc@arcor.de> wrote:
>
>>> > This works because there are two decoupled processes, shared memory
>>> between
>>> > them and the fact that star reads names from directories in one big
>>> chunk.
>>> >
>>>
>>> Honestly, that's news to me. Which package has star?
>>
>> eix -e star
>
> To help star to buffer, give star a large fifo size that is up to haslf of the
> RAM in your machine, e.g. fs=1000m
>
> To make sure that star gives fast file creation (unpacking of archives) on
> filesystems that do not support fast verified transactions, you need to make
> star as "insecure" as other software to get comparable results, so add:
>
> -no-fsync
>
> Jörg
>


The problem with star is that when I need to copy a large number of
files, it isn't on the DVD I boot from. That's why most people use cp
since it is on every bootable media I have ever booted. That includes
the Gentoo bootable media.

Since star is so good, why not get them to include it on the bootable
media? Is it to large a package or what?

Dale

:-) :-)

--
I am only responsible for what I said ... Not for what you understood or
how you interpreted my words!

Miss the compile output? Hint:
EMERGE_DEFAULT_OPTS="--quiet-build=n"

Neil Bothwick 02-13-2012 09:58 PM

RFC : fast copying of a whole directory tree
 
On Tue, 14 Feb 2012 00:42:56 +0700, Pandu Poluan wrote:

> Hehhe... sorry, I'm on the road and don't have Gentoo on my
> smartphone :-P

Not even via SSH? :P


--
Neil Bothwick

If at first you don't succeed you'll get lots of advice.


All times are GMT. The time now is 09:32 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.