FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 11-26-2011, 10:37 AM
"Paweł Hajdan, Jr."
 
Default proj/portage:master commit in: pym/portage/dbapi/

On 11/26/11 12:26 PM, Nirbheek Chauhan wrote:
> If it should be sorted[1], it should really be sorted in the reverse
> order of distfile-download size. That would be extremely useful on
> systems with slow internet connections. [...]
>
> 1. I'm obviously assuming that dep nodes that do not depend on each
> other would be sorted

Seconded. I think practical reasons are more important than an arbitrary
order, and I'd also benefit from this download-oriented order.
 
Old 11-26-2011, 10:38 AM
Fabian Groffen
 
Default proj/portage:master commit in: pym/portage/dbapi/

On 26-11-2011 16:56:41 +0530, Nirbheek Chauhan wrote:
> On Sat, Nov 26, 2011 at 4:28 PM, Fabian Groffen <grobian@gentoo.org> wrote:
> > On 26-11-2011 01:54:35 +0000, Arfrever Frehtes Taifersar Arahesis wrote:
> >> commit: * * 1d4ac47c28706094230cb2c4e6ee1c1c71629aa0
> >> T> Org>
> >> AuthorDate: Sat Nov 26 01:52:49 2011 +0000
> >> Commit: * * Arfrever Frehtes Taifersar Arahesis <arfrever <AT> gentoo <DOT> org>
> >> CommitDate: Sat Nov 26 01:52:49 2011 +0000
> >> URL: * * * *http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=1d4ac47c
> >>
> >> dblink.mergeme(): Merge files in alphabetic order.
> >
> > What's the advantage of this? *I don't really like to pay for sorting a
> > potentially huge list just for some eye-candy. *(That's omitted by
> > default these days anyway...)
> > Any other opinions on this one?
> >
>
> If it should be sorted[1], it should really be sorted in the reverse
> order of distfile-download size. That would be extremely useful on
> systems with slow internet connections. Too many times have I sat
> waiting for libreoffice-bin to download while a webkit-gtk recompile
> waits in the queue.
>
> We already have the information during dependency resolution with
> --verbose, and it costs very little. Besides, sorting even 30,000
> entries (if you're merging every ebuild in portage) should not take
> more than a few secs.

A linux kernel has around that much of files, and I really wonder if
it's worth waiting a couple of seconds (probably more on sparc and arm
systems) just because then the files are in sorted order.

> 1. I'm obviously assuming that dep nodes that do not depend on each
> other would be sorted

I think this is per package.

I didn't watch closely enough the reply-to headers, the
gentoo-portage-dev list was my original target, which obviously makes
more sense for this context.


--
Fabian Groffen
Gentoo on a different level
 
Old 11-26-2011, 11:50 AM
Nirbheek Chauhan
 
Default proj/portage:master commit in: pym/portage/dbapi/

On Sat, Nov 26, 2011 at 5:08 PM, Fabian Groffen <grobian@gentoo.org> wrote:
> On 26-11-2011 16:56:41 +0530, Nirbheek Chauhan wrote:
>> [...] Besides, sorting even 30,000
>> entries (if you're merging every ebuild in portage) should not take
>> more than a few secs.
>
> A linux kernel has around that much of files, and I really wonder if
> it's worth waiting a couple of seconds (probably more on sparc and arm
> systems) just because then the files are in sorted order.
>

I'm not sure the two are really comparable. However, looking at a
simple string sort on 30,000 strings, I don't see it taking a
significant amount of time at all:

import random
import time
t1 = time.time()
a = range(100000, 130000)
random.shuffle(a)
b = [str(i) for i in a]
t2 = time.time()
b.sort()
t3 = time.time()
print(t2-t1)
print(t3-t2)

----
0.0682320594788
0.0464689731598


>> 1. I'm obviously assuming that dep nodes that do not depend on each
>> other would be sorted
>
> I think this is per package.
>

Actually, reading the code it seems that it's about the file merge
order of a single package. My participation in this entire discussion
is m00t. Never mind.

--
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
 
Old 11-26-2011, 11:59 AM
Ciaran McCreesh
 
Default proj/portage:master commit in: pym/portage/dbapi/

On Sat, 26 Nov 2011 18:20:27 +0530
Nirbheek Chauhan <nirbheek@gentoo.org> wrote:
> Actually, reading the code it seems that it's about the file merge
> order of a single package. My participation in this entire discussion
> is m00t. Never mind.

...in which case it's often an awful lot faster to sort by inode, not by
filename. Try it when installing a kernel sources package.

--
Ciaran McCreesh
 
Old 11-26-2011, 12:44 PM
Rich Freeman
 
Default proj/portage:master commit in: pym/portage/dbapi/

On Sat, Nov 26, 2011 at 7:59 AM, Ciaran McCreesh
<ciaran.mccreesh@googlemail.com> wrote:
> On Sat, 26 Nov 2011 18:20:27 +0530
> Nirbheek Chauhan <nirbheek@gentoo.org> wrote:
>> Actually, reading the code it seems that it's about the file merge
>> order of a single package. My participation in this entire discussion
>> is m00t. Never mind.
>
> ...in which case it's often an awful lot faster to sort by inode, not by
> filename. Try it when installing a kernel sources package.

I can believe it. Btrfs added inode-order directory indexes precisely
for this reason. I'd have to look up the details but I think it was
designed to return the directories in this order to function calls so
that anything that iterates through the tree would get this
optimization by default. Of course, if you then resort the list first
you lose that. (It also has the ext3 dir_index-style indexes for
named file lookups.)

Oh, on the topic of btrfs, if any emerge operations do file copies,
adding --reflink=auto to the cp command will GREATLY improve
performance. That does a copy-on-write copy - it behaves like a
hard-link as far as time to create goes, but it behaves like a full
copy as far as modifications not being shared goes. It also uses
almost no additional disk space until the content starts to diverge
between the copies. Setting reflink=auto should be safe on non-COW
filesystems as it will fall back to a normal copy if the operation
isn't supported. It is available in stable coreutils. Some speculate
that this option could increase fragmentation (both copies will share
extents from the original file, and have some extents of their own),
but btrfs doesn't overwrite anything in-place so fragmentation is a
potential issue with any file modification (change one byte in the
middle of a file and you get a new record somewhere with one byte in
it and a bunch of pointers in the metadata saying "stick this byte
here" - though for one byte I'm guessing it would end up in the
metadata tree much as ext3 stores small files in their inodes so the
one byte would be in ram when the pointer to it is loaded).

Rich
 
Old 11-26-2011, 02:09 PM
Michał Górny
 
Default proj/portage:master commit in: pym/portage/dbapi/

On Sat, 26 Nov 2011 08:44:28 -0500
Rich Freeman <rich0@gentoo.org> wrote:

> Oh, on the topic of btrfs, if any emerge operations do file copies,
> adding --reflink=auto to the cp command will GREATLY improve
> performance. That does a copy-on-write copy - it behaves like a
> hard-link as far as time to create goes, but it behaves like a full
> copy as far as modifications not being shared goes. [...]

We don't rely on external tools to do the copying. AFAIR it uses
Python's shutil module which is rather poor. I'm slowly working on
creating atomic-install tool for merging this more optimally [1].

But in this particular case, I don't think COW is particularly useful.
If it works only on filesystem bounds, we could move the file directly
anyway.

[1]:https://github.com/mgorny/atomic-install

--
Best regards,
Michał Górny
 
Old 11-26-2011, 02:25 PM
Rich Freeman
 
Default proj/portage:master commit in: pym/portage/dbapi/

On Sat, Nov 26, 2011 at 10:09 AM, Michał Górny <mgorny@gentoo.org> wrote:
> But in this particular case, I don't think COW is particularly useful.
> If it works only on filesystem bounds, we could move the file directly
> anyway.

Yup - I would only use it if you really are doing a copy and not a
move (neglecting the fact that the implementation of a
cross-filesystem move does a copy first). I imagine many ebuilds do
copy operations internally, but probably not to an extent where it
would make much difference. I'm not sure how doins/dobin/etc are
implemented - I think they're copies and so allowing for the fact that
not everybody uses a tmpfs it might make sense to fix those.

Rich
 
Old 11-26-2011, 02:50 PM
Nirbheek Chauhan
 
Default proj/portage:master commit in: pym/portage/dbapi/

On Sat, Nov 26, 2011 at 7:14 PM, Rich Freeman <rich0@gentoo.org> wrote:
> isn't supported. *It is available in stable coreutils. *Some speculate
> that this option could increase fragmentation (both copies will share
> extents from the original file, and have some extents of their own),
> but btrfs doesn't overwrite anything in-place so fragmentation is a
> potential issue with any file modification (change one byte in the

Adding to your comments on this:

To mitigate such issues, newer versions of the btrfs fs driver have
automatic online defragmentation as well. Works quite well for
moderate fragmentation.

A particularly ghastly example where fragmentation issues become
pathological in nature are files that are fsync()ed very frequently. A
typical example are the *.sqlite files in ~/.mozilla which easily get
hundreds or even thousands of fragments after a few hours worth of
firefox usage (can be verified with filefrag).

To fix such things, regular online defragmentation of those specific
files can be done using `btrfs fi defrag <file>`.

--
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
 
Old 11-26-2011, 02:58 PM
Nirbheek Chauhan
 
Default proj/portage:master commit in: pym/portage/dbapi/

On Sat, Nov 26, 2011 at 8:39 PM, Michał Górny <mgorny@gentoo.org> wrote:
> But in this particular case, I don't think COW is particularly useful.
> If it works only on filesystem bounds, we could move the file directly
> anyway.
>

There are still a few specific cases in which CoW would indeed be
useful. IIRC, reflinking of files works across btrfs *subvolumes*, and
such a copy would normally be detected as a cross-device move. Another
use would be an patch-merge which makes use of *ranged reflinks* to
only CoW copy those parts of the file that were changed[1]. rsync has
support for this, but only while appending to files (--append-verify
--no-whole-file).


1. Somewhat like rope data structures, with the caveat that ranges
must be block-size aligned.

--
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
 
Old 11-26-2011, 03:00 PM
Michał Górny
 
Default proj/portage:master commit in: pym/portage/dbapi/

On Sat, 26 Nov 2011 10:25:15 -0500
Rich Freeman <rich0@gentoo.org> wrote:

> On Sat, Nov 26, 2011 at 10:09 AM, Michał Górny <mgorny@gentoo.org>
> wrote:
> > But in this particular case, I don't think COW is particularly
> > useful. If it works only on filesystem bounds, we could move the
> > file directly anyway.
>
> Yup - I would only use it if you really are doing a copy and not a
> move (neglecting the fact that the implementation of a
> cross-filesystem move does a copy first). I imagine many ebuilds do
> copy operations internally, but probably not to an extent where it
> would make much difference. I'm not sure how doins/dobin/etc are
> implemented - I think they're copies and so allowing for the fact that
> not everybody uses a tmpfs it might make sense to fix those.

AFAICS doins uses 'install' mostly, and sometimes 'cp' (with symlinks).
I don't see any variant of '--reflink' option for 'install'.

--
Best regards,
Michał Górny
 

Thread Tools




All times are GMT. The time now is 07:36 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org