FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 01-14-2010, 11:07 PM
Paul Arthur
 
Default proxy maintainership and gentoo-x86 scm

On 2010-01-14, Nirbheek Chauhan <nirbheek@gentoo.org> wrote:
> On Wed, Jan 13, 2010 at 9:24 PM, Mike Frysinger <vapier@gentoo.org> wrote:
>> i think our current work flows also significantly impede the smooth running of
>> this. ¬*if we had were using a dscm (git) on gentoo-x86, i feel like it'd be a
>> much smoother ride for Gentoo devs to pull from a proxy maintainer and push on
>> their behalf.
>
> In theory, yes. In practice, git is too slow to handle 30,000 files.
> Even simple operations like git add become painful even if you put the
> whole of portage on tmpfs since git does a stat() on every single file
> in the repository with every operation.
>
> Simple test: do a git init followed by git add && git commit -m
> "Initial commit" in your portage dir (.gitignore packages/ and
> distfiles/)
>
> Once this is done, you can test how it'll feel like to use a DSCM on
> portage (without history). Unless you have a really fast SSD and
> processor, you'll want to go back to the good old days of CVS with its
> network-bound latencies on just 5-6 files in the current dir.

Ouch. I wanted to test this in a fairly bad scenario, so I gave it a
try on my old, low-spec fileserver.

aravis-root /usr/portage # time git add .
real 19m1.333s
user 0m53.230s
sys 1m9.350s

aravis-root /usr/portage # time git commit -m "Initial commit"
real 19m44.700s
user 0m23.740s
sys 0m49.320s

Then, with no changes whatsoever:
aravis-root /usr/portage # time git status
real 4m26.454s
user 0m2.090ssys 0m6.380s

Finally, a ray of hope:
aravis-root /usr/portage/app-emulation # time git add xen-tools-gdbserver/
real 0m0.978s
user 0m0.400s
sys 0m0.180s

But no:
aravis-root /usr/portage/app-emulation # time git commit -m "Commit 2"
real 3m18.502s
user 0m1.830s
sys 0m7.530s

Now, this is fairly close to a worst-case scenario, being an old
computer with a slow drive. A new computer with faster drives (in
RAID 1+0) is much more reasonable.

lasaraleen portage # time git add .
real 0m26.002s
user 0m6.256s
sys 0m6.572s

lasaraleen portage # time git commit -m "Initial commit"
real 0m27.371s
user 0m3.704s
sys 0m3.856s

lasaraleen app-emulation # time git commit -m "Commit 2"
real 0m1.374s
user 0m0.468s
sys 0m0.904s


--
Having to infer what Unix is solely from a copy of the GNU Manifesto is
not really an exercise you want to undertake.
--AdB
 
Old 01-14-2010, 11:47 PM
"Robin H. Johnson"
 
Default proxy maintainership and gentoo-x86 scm

On Thu, Jan 14, 2010 at 07:07:01PM -0500, Paul Arthur wrote:
> On 2010-01-14, Nirbheek Chauhan <nirbheek@gentoo.org> wrote:
> > On Wed, Jan 13, 2010 at 9:24 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> >> i think our current work flows also significantly impede the smooth running of
> >> this. ¬*if we had were using a dscm (git) on gentoo-x86, i feel like it'd be a
> >> much smoother ride for Gentoo devs to pull from a proxy maintainer and push on
> >> their behalf.
> >
> > In theory, yes. In practice, git is too slow to handle 30,000 files.
> > Even simple operations like git add become painful even if you put the
> > whole of portage on tmpfs since git does a stat() on every single file
> > in the repository with every operation.
> >
> > Simple test: do a git init followed by git add && git commit -m
> > "Initial commit" in your portage dir (.gitignore packages/ and
> > distfiles/)
> >
> > Once this is done, you can test how it'll feel like to use a DSCM on
> > portage (without history). Unless you have a really fast SSD and
> > processor, you'll want to go back to the good old days of CVS with its
> > network-bound latencies on just 5-6 files in the current dir.
>
> Ouch. I wanted to test this in a fairly bad scenario, so I gave it a
> try on my old, low-spec fileserver.
You didn't repack or at least run git-gc between the huge add and your
everyday operations. Do that, and then measure the ops with both cold
and hot cache.

The initial packing and adding are very intensive, even on fast
machines, but that's because they are dealing with a lot of small pieces
of data. Packing has benefited immensely from being fully multi-threaded
in Git.

I'd love somebody to do the SoC stats again:
http://www.gentoo.org/proj/en/infrastructure/cvs-migration.xml?style=printable

Using the git repo conversion I did:
http://git-exp.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=summary

--
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
 
Old 01-15-2010, 06:53 AM
Max Arnold
 
Default proxy maintainership and gentoo-x86 scm

On Thu, Jan 14, 2010 at 07:07:01PM -0500, Paul Arthur wrote:
> Ouch. I wanted to test this in a fairly bad scenario, so I gave it a
> try on my old, low-spec fileserver.

Just out of curiosity did several tests with Mercurial:

$ mkdir /scratch/tmp
$ time tar --use-compress-program=lzma -xf portage-20100114.tar.lzma -C /scratch/tmp
real 1m3.696s
user 0m25.082s
sys 0m12.549s

$ cd /scratch/tmp/portage
$ hg init --time
Time: real 0.070 secs (user 0.040+0.000 sys 0.000+0.000)

$ hg status --time | wc -l
Time: real 9.920 secs (user 6.290+0.000 sys 1.970+0.000)
113272

$ hg add --time | wc -l
Time: real 23.050 secs (user 20.450+0.000 sys 2.300+0.000)
113272

$ hg commit --time -m "Initial commit"
Time: real 758.010 secs (user 354.250+0.000 sys 93.400+0.000)

$ cd ..
$ hg clone --noupdate --time portage portage-work
Time: real 34.530 secs (user 9.160+0.000 sys 9.750+0.000)

$ cd portage-work
$ hg update --time
113272 files updated, 0 files merged, 0 files removed, 0 files unresolved
Time: real 538.330 secs (user 218.140+0.000 sys 74.520+0.000)

$ mkdir dev-util/hg-test
$ touch dev-util/hg-test/Manifest
$ hg status --time
? dev-util/hg-test/Manifest
Time: real 6.350 secs (user 4.520+0.000 sys 1.310+0.000)

$ hg add --time
adding dev-util/hg-test/Manifest
Time: real 10.250 secs (user 8.610+0.000 sys 1.390+0.000)

$ hg commit --time -m "added hg-test"
Time: real 17.370 secs (user 15.400+0.000 sys 1.430+0.000)

$ hg out --time ../portage | grep changeset | wc -l
Time: real 0.930 secs (user 0.690+0.000 sys 0.070+0.000)
1

$ hg push --time ../portage
pushing to ../portage
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
Time: real 6.100 secs (user 5.450+0.000 sys 0.330+0.000)

$ cd ../portage
$ hg update --time
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
Time: real 96.390 secs (user 14.950+0.000 sys 5.420+0.000)

$ hg log --time | grep changeset | wc -l
Time: real 0.530 secs (user 0.460+0.000 sys 0.060+0.000)
2

This is on a rather slow box (nettop with VIA C7 1200 MHz CPU, 1G RAM and 5400 RPM 2.5" drive)
 
Old 01-15-2010, 07:32 AM
Mike Frysinger
 
Default proxy maintainership and gentoo-x86 scm

On Thursday 14 January 2010 18:25:35 Petteri Ršty wrote:
> On 01/15/2010 12:54 AM, Robin H. Johnson wrote:
> > That top item is the largest blocker. The actual conversion time is down
> > to 9 hours, but with more than that again in setting it up. I'd like to
> > get the conversion time down to UNDER 4 hours. It's mostly
> > single-threaded, and we've got lots of cores available, it just needs
> > parallelization. We're basically dead in the water during the
> > conversion, there is NO incremental support at all.
>
> Is a day really an issue? If there's something extremely urgent while
> the conversion is going on, you can just turn CVS back on and then let's
> try again some other day.

indeed ... we've already tolerated long downtimes in different infrastructure
pieces in our history. devs can live with advertised downtimes since the
timing is known. it's the unknown/indefinite things that annoy people.
-mike
 
Old 01-15-2010, 12:17 PM
Nguyen Thai Ngoc Duy
 
Default proxy maintainership and gentoo-x86 scm

On 1/15/10, Nirbheek Chauhan <nirbheek@gentoo.org> wrote:
> > "git commit <dir>" and "git status <dir>" still do full tree lstat().
> > I can try to make a patch or two to reduce lstat() in such cases.
> >
>
>
> That would definitely compliment the --stat option to git diff et al,
> making git more usable on repos with a huge no. of files. Now that I
> think about it, why does git <command> <dir> need to do a full tree
> stat at all? Doesn't the added specification of <dir> mean "I'm only
> interested in this dir for this command, other stuff doesn't matter"?

Probably because the difference is too small to notice on smaller-size
projects, or because people tend to do whole-tree operations so "git
<command> <dir>"'s performance does not catch the developers' eyes.

Anyway, stat()ing 80k files takes about 1 second on my machine, still
tolerable. There is whole-tree open() in "git status" to check for
untracked files, that contributes more on "git status" slowness. How
long on average did a Git operation take on your tmpfs?
--
Duy
 
Old 01-19-2010, 09:29 PM
Arun Raghavan
 
Default proxy maintainership and gentoo-x86 scm

2010/1/15 Robin H. Johnson <robbat2@gentoo.org>:
[...]
> pre-upload-hook: ford_prefect, can I have something by Jan 21st please?
> Even just the core C infrastructure for the hook.

Sorry about dropping the ball on this for so long - 21st would be
hard, but I should be able to get this to you by the weekend.

Cheers,
--
Arun Raghavan
http://arunraghavan.net/
(Ford_Prefect | Gentoo) & (arunsr | GNOME)
 

Thread Tools




All times are GMT. The time now is 06:14 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org