FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo User

 
 
LinkBack Thread Tools
 
Old 02-13-2012, 11:37 PM
Pandu Poluan
 
Default RFC : fast copying of a whole directory tree

On Feb 14, 2012 6:00 AM, "Neil Bothwick" <neil@digimed.co.uk> wrote:

>

> On Tue, 14 Feb 2012 00:42:56 +0700, Pandu Poluan wrote:

>

> > Hehhe... sorry, I'm on the road and don't have Gentoo on my

> > smartphone :-P

>

> Not even via SSH? :P

>


It's a new phone and I forgot the port-knocking sequence to open the ssh port >.<


Rgds,
 
Old 02-14-2012, 08:05 AM
Florian Philipp
 
Default RFC : fast copying of a whole directory tree

Am 13.02.2012 16:31, schrieb Grant Edwards:
> On 2012-02-13, Michael Orlitzky <michael@orlitzky.com> wrote:
>> On 02/13/12 05:49, Helmut Jarausch wrote:
>>>
>>> I've written a small Python program which outputs the file names in
>>> i-node order. If this is fed into tar or cpio nearly no seeks are
>>> required during copying.
>>
>> What makes you think the inodes are sequential on-disk?
>
> Even if the i-nodes are sequential on-disk, there's no reason to think
> that the data blocks associated with the inodes are in any particular
> order with respect to the i-nodes themselves.

You could probably find the intended order by using debugfs (at least
for ext*). The following command should output the first physical block
of every file:
find /var/db/portage/ -type f -printf 'bmap <%i> 0
' | sudo debugfs
/dev/mapper/vg-portage

Todo left as an exercise to the reader:
- Clean debugfs output
- Map inodes back to file names (hint: don't use debugfs's 'ncheck')
- sort | cut | xargs cp

Possible further improvement: Sort the files so that the first block of
the next file is close to the last block of the previous file.

Regards,
Florian Philipp
 
Old 02-14-2012, 08:57 AM
 
Default RFC : fast copying of a whole directory tree

Florian Philipp <lists@binarywings.net> wrote:

> > Even if the i-nodes are sequential on-disk, there's no reason to think
> > that the data blocks associated with the inodes are in any particular
> > order with respect to the i-nodes themselves.
>
> You could probably find the intended order by using debugfs (at least
> for ext*). The following command should output the first physical block
> of every file:
> find /var/db/portage/ -type f -printf 'bmap <%i> 0
' | sudo debugfs
> /dev/mapper/vg-portage

This kind of order is not important for copy speed.

Copy speed is dominated by write speed and write speed is dominated by seeks
that are a result of keeping meta data up to date.

Jörg

--
EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
js@cs.tu-berlin.de (uni)
joerg.schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
 
Old 02-14-2012, 11:50 AM
Mick
 
Default RFC : fast copying of a whole directory tree

On 13 February 2012 22:11, Dale <rdalek1967@gmail.com> wrote:
> Joerg Schilling wrote:
>> Nikos Chantziaras <realnc@arcor.de> wrote:
>>
>>>> *> This works because there are two decoupled processes, shared memory
>>>> between
>>>> *> them and the fact that star reads names from directories in one big
>>>> chunk.
>>>> *>
>>>>
>>>> Honestly, that's news to me. Which package has star?
>>>
>>> eix -e star
>>
>> To help star to buffer, give star a large fifo size that is up to haslf of the
>> RAM in your machine, e.g. fs=1000m
>>
>> To make sure that star gives fast file creation (unpacking of archives) on
>> filesystems that do not support fast verified transactions, you need to make
>> star as "insecure" as other software to get comparable results, so add:
>>
>> * * * -no-fsync
>>
>> Jörg
>>
>
>
> The problem with star is that when I need to copy a large number of
> files, it isn't on the DVD I boot from. *That's why most people use cp
> since it is on every bootable media I have ever booted. *That includes
> the Gentoo bootable media.
>
> Since star is so good, why not get them to include it on the bootable
> media? *Is it to large a package or what?

It used to be in the Knoppix package list, but alas it is no more. :-(

I still keep my old copy somewhere just for this reason.
--
Regards,
Mick
 
Old 02-14-2012, 04:45 PM
Florian Philipp
 
Default RFC : fast copying of a whole directory tree

Am 14.02.2012 10:57, schrieb Joerg Schilling:
> Florian Philipp <lists@binarywings.net> wrote:
>
>>> Even if the i-nodes are sequential on-disk, there's no reason to think
>>> that the data blocks associated with the inodes are in any particular
>>> order with respect to the i-nodes themselves.
>>
>> You could probably find the intended order by using debugfs (at least
>> for ext*). The following command should output the first physical block
>> of every file:
>> find /var/db/portage/ -type f -printf 'bmap <%i> 0
' | sudo debugfs
>> /dev/mapper/vg-portage
>
> This kind of order is not important for copy speed.
>
> Copy speed is dominated by write speed and write speed is dominated by seeks
> that are a result of keeping meta data up to date.
>
> Jörg
>

I cannot verify that hypothesis.

Test setup:
1x 7200rpm 2,5" HDD
/var/db/portage is my portage tree, ext4
/dev/mapper/vg-portage is its block device
/tmp is ext4

First test --- copy whole tree just with `cpio` (performance tested and
similar to `cp -a`):
$ echo 1 >/proc/sys/vm/drop_caches
$ time find /var/db/portage/ -type f -print0 |
$ cpio -p0 --make-directories /tmp/portage/

real 11m52.657s
user 0m1.848s
sys 0m19.802s

Second test --- Sort by starting physical block number:
$ echo 1 >/proc/sys/vm/drop_caches
$ FIFO=/tmp/$(uuidgen).fifo
$ mkfifo "$FIFO"
$ time find /var/db/portage/ -type f
$ -fprintf "$FIFO" 'bmap <%i> 0
' -print0 |
$ tr '
' '
' | paste <(
$ debugfs -f "$FIFO" /dev/mapper/vg-portage |
$ grep -E '^[[:digit:]]+') - |
$ sort -k 1,1n | cut -f 2- | tr '
' '
' |
$ cpio -p0 --make-directories /tmp/portage/
$ unlink "$FIFO"

real 2m8.400s
user 0m1.888s
sys 0m15.417s

Using `xargs -0 cat >/dev/null` instead of `cpio` yields 9m27.745s and
1m11.087s, respectively.

Some comments to the sorting script:
- Using a fifo instead of a pipe for issuing commands to debugfs is faster.
- If it is not obvious, the two `tr` commands are there because `paste`
and `cut` cannot handle zero-terminated lines but file names might
contain line breaks.
- `grep` is there because `debugfs` echoes all commands. Filtering every
odd numbered line should also work.
- A production-ready script should probably use `join` instead of
`paste` to deal with read errors of `debugfs` (for example if files are
removed between `find` and `debugfs`). Currently, this leads to
misaligned output.

BTW: I wanted to test it with `star -copy` but this resulted in buffer
overflows similar to these:
http://permalink.gmane.org/gmane.comp.archivers.star.user/752

Regards,
Florian Philipp
 

Thread Tools




All times are GMT. The time now is 06:55 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org