FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 01-14-2010, 06:46 PM
Pacho Ramos
 
Default proxy maintainership and gentoo-x86 scm

El jue, 14-01-2010 a las 18:04 +0100, "Paweł Hajdan, Jr." escribió:
>
> It would be nice to post that info to a webpage. That could increase a
> chance of a volunteer contributing some help.

I agree, maybe that way other people (from forums for example) could
help if they know about git (or elected system)
 
Old 01-14-2010, 07:31 PM
Daniel Bradshaw
 
Default proxy maintainership and gentoo-x86 scm

Excuse me butting in... I'm just a little confused.
Not that this is anything new, I'm just ... well, confused.

On 01/14/2010 12:49 PM, Nirbheek Chauhan wrote:

In theory, yes. In practice, git is too slow to handle 30,000 files.
Even simple operations like git add become painful even if you put the
whole of portage on tmpfs since git does a stat() on every single file
in the repository with every operation.

My understanding is that git was developed as the SCM for the kernel
project.
A quick check in an arbitary untouched kernel in /usr/src/ suggests a
file [1] count of 25300.


Assuming that my figure isn't out by an order of magnitude, how does the
kernel team get along with git and 25k files but it is deathly slow for
our 30k?
Or, to phrase the question better... what are they doing that allows
them to manage?


Regards,
Daniel

[1] `find -type f | wc -l`, so all regular files
 
Old 01-14-2010, 09:10 PM
Nirbheek Chauhan
 
Default proxy maintainership and gentoo-x86 scm

On Thu, Jan 14, 2010 at 7:17 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> What you need is "git update-index --assume-unchanged". That feature
> was introduced exactly to reduce stat().
>
> BTW, if you know you only work in certain directories, doing "git diff
> --stat <dir>", "git diff --cached --stat <dir>" instead of "git
> status" would also help. Make aliases for them ("git dis" and "git
> dics" in my ~/.gitconfig) so you don't have to type full command every
> time.
>

This is very interesting; I did not know about this feature! Thanks
for pointing it out

I'll try this stuff out and report back once I have my portage tmpfs
created again.

> "git commit <dir>" and "git status <dir>" still do full tree lstat().
> I can try to make a patch or two to reduce lstat() in such cases.
>

That would definitely compliment the --stat option to git diff et al,
making git more usable on repos with a huge no. of files. Now that I
think about it, why does git <command> <dir> need to do a full tree
stat at all? Doesn't the added specification of <dir> mean "I'm only
interested in this dir for this command, other stuff doesn't matter"?

> Does that help?

Quite helpful indeed; now if only someone would implement recursive
timestamps for directories...

--
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
 
Old 01-14-2010, 09:21 PM
Nirbheek Chauhan
 
Default proxy maintainership and gentoo-x86 scm

On Fri, Jan 15, 2010 at 2:01 AM, Daniel Bradshaw <daniel@the-cell.co.uk> wrote:
> On 01/14/2010 12:49 PM, Nirbheek Chauhan wrote:
>>
>> In theory, yes. In practice, git is too slow to handle 30,000 files.
>> Even simple operations like git add become painful even if you put the
>> whole of portage on tmpfs since git does a stat() on every single file
>> in the repository with every operation.
>>
>
> My understanding is that git was developed as the SCM for the kernel
> project.
> A quick check in an arbitary untouched kernel in /usr/src/ suggests a file
> [1] count of 25300.
>
> Assuming that my figure isn't out by an order of magnitude, how does the
> kernel team get along with git and 25k files but it is deathly slow for our
> 30k?
> Or, to phrase the question better... what are they doing that allows them to
> manage?
>

My bad. I did the tests a while back, and the number "30,000" is
actually for the no. of ebuilds in portage. The no. of files is
actually ~113,000 (difference comes because every package has a
manifest+changelog+metadata.xml+patches). OTOH, the no. of directories
is "just" ~20,000, so if git would only do a stat() on directories,
it would get into the "usable" circle.

Also, since git does a stat on directories as well as files, you can
say that every command has to do ~133,000 stats, which is damn slow
even when cached.

--
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
 
Old 01-14-2010, 09:29 PM
Nirbheek Chauhan
 
Default proxy maintainership and gentoo-x86 scm

On Fri, Jan 15, 2010 at 3:51 AM, Nirbheek Chauhan <nirbheek@gentoo.org> wrote:
> My bad. I did the tests a while back, and the number "30,000" is
> actually for the no. of ebuilds in portage. The no. of files is
> actually ~113,000 (difference comes because every package has a
> manifest+changelog+metadata.xml+patches).

Further refinement: ~92,000

Removed metadata/ (-28,000 which won't be around in the git tree), and
ChangeLog (-13,000 which would be redundant, and should be
auto-generated alongwith metadata prior to distribution via rsync.
Hey, this is something that needs to be done too!)

--
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
 
Old 01-14-2010, 09:32 PM
Daniel Bradshaw
 
Default proxy maintainership and gentoo-x86 scm

On 01/14/2010 10:21 PM, Nirbheek Chauhan wrote:

On Fri, Jan 15, 2010 at 2:01 AM, Daniel Bradshaw<daniel@the-cell.co.uk> wrote:


On 01/14/2010 12:49 PM, Nirbheek Chauhan wrote:


In theory, yes. In practice, git is too slow to handle 30,000 files.
Even simple operations like git add become painful even if you put the
whole of portage on tmpfs since git does a stat() on every single file
in the repository with every operation.



My understanding is that git was developed as the SCM for the kernel
project.
A quick check in an arbitary untouched kernel in /usr/src/ suggests a file
[1] count of 25300.

Assuming that my figure isn't out by an order of magnitude, how does the
kernel team get along with git and 25k files but it is deathly slow for our
30k?
Or, to phrase the question better... what are they doing that allows them to
manage?



My bad. I did the tests a while back, and the number "30,000" is
actually for the no. of ebuilds in portage. The no. of files is
actually ~113,000 (difference comes because every package has a
manifest+changelog+metadata.xml+patches). OTOH, the no. of directories
is "just" ~20,000, so if git would only do a stat() on directories,
it would get into the "usable" circle.

Also, since git does a stat on directories as well as files, you can
say that every command has to do ~133,000 stats, which is damn slow
even when cached.




Ah, so one side is off by a fair bit.
Thanks for the clarification.

Regards,
Daniel
 
Old 01-14-2010, 09:53 PM
Nirbheek Chauhan
 
Default proxy maintainership and gentoo-x86 scm

On Thu, Jan 14, 2010 at 10:34 PM, "Paweł Hajdan, Jr."
<phajdan.jr@gentoo.org> wrote:
> It would be nice to post that info to a webpage. That could increase a
> chance of a volunteer contributing some help.
>

That list is incomplete, a more complete todo list can be found by
looking at the archives at archives.gentoo.org/gentoo-scm/

I'll compile a proper list and put it up somewhere, thanks for the suggestion.


--
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
 
Old 01-14-2010, 09:54 PM
"Robin H. Johnson"
 
Default proxy maintainership and gentoo-x86 scm

On Fri, Jan 15, 2010 at 03:59:00AM +0530, Nirbheek Chauhan wrote:
> On Fri, Jan 15, 2010 at 3:51 AM, Nirbheek Chauhan <nirbheek@gentoo.org> wrote:
> > My bad. I did the tests a while back, and the number "30,000" is
> > actually for the no. of ebuilds in portage. The no. of files is
> > actually ~113,000 (difference comes because every package has a
> > manifest+changelog+metadata.xml+patches).
> Further refinement: ~92,000
>
> Removed metadata/ (-28,000 which won't be around in the git tree), and
> ChangeLog (-13,000 which would be redundant, and should be
> auto-generated alongwith metadata prior to distribution via rsync.
> Hey, this is something that needs to be done too!)
ChangeLog != commit logs.

There is frequently additional information in the CVS commit messages
that isn't in the ChangeLogs. ChangeLogs aren't always updated reliably
either, esp for ebuild cleanups (hi vapier).

The actual performance of git itself isn't the largest problem.
The migration issues, esp. the speed of the conversion are.

My status of the migration side itself hasn't changed since the end of
October:
http://archives.gentoo.org/gentoo-scm/msg_e0a0a41200c1fc6a0fda68b4ff9d2c61.xml

That top item is the largest blocker. The actual conversion time is down
to 9 hours, but with more than that again in setting it up. I'd like to
get the conversion time down to UNDER 4 hours. It's mostly
single-threaded, and we've got lots of cores available, it just needs
parallelization. We're basically dead in the water during the
conversion, there is NO incremental support at all.

Side-project to the above: Is there anything link Psyco for Python
acceleration that works on 64-bit machines? Psyco itself has a warning
on the frontpage of no 64-bit support.

pre-upload-hook: ford_prefect, can I have something by Jan 21st please?
Even just the core C infrastructure for the hook.

For the partial tree users, the support is actually IN 1.6.6 series. It
needs a little more cooking I think however, there are some patches
queued for it, that will probably end up in 1.6.6.1 or similar.

We DO still need somebody that cares about CVS access to test with
git-cvssserver against the existing conversion.

--
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
 
Old 01-14-2010, 10:25 PM
Petteri Rty
 
Default proxy maintainership and gentoo-x86 scm

On 01/15/2010 12:54 AM, Robin H. Johnson wrote:
>
> That top item is the largest blocker. The actual conversion time is down
> to 9 hours, but with more than that again in setting it up. I'd like to
> get the conversion time down to UNDER 4 hours. It's mostly
> single-threaded, and we've got lots of cores available, it just needs
> parallelization. We're basically dead in the water during the
> conversion, there is NO incremental support at all.
>

Is a day really an issue? If there's something extremely urgent while
the conversion is going on, you can just turn CVS back on and then let's
try again some other day.

Regards,
Petteri
 
Old 01-14-2010, 10:28 PM
Nirbheek Chauhan
 
Default proxy maintainership and gentoo-x86 scm

On Fri, Jan 15, 2010 at 4:24 AM, Robin H. Johnson <robbat2@gentoo.org> wrote:
> On Fri, Jan 15, 2010 at 03:59:00AM +0530, Nirbheek Chauhan wrote:
>> ChangeLog (-13,000 which would be redundant, and should be
>> auto-generated alongwith metadata prior to distribution via rsync.
>> Hey, this is something that needs to be done too!)
> ChangeLog != commit logs.
>
> There is frequently additional information in the CVS commit messages
> that isn't in the ChangeLogs. ChangeLogs aren't always updated reliably
> either, esp for ebuild cleanups (hi vapier).
>

All the more reason to just chuck manual ChangeLogs in favour of
auto-generated ones. Atleast, that's what gnome projects do since they
moved to git [ http://live.gnome.org/Git/ChangeLog ].

However, this will entail a change in how commit messages are
formatted; git commit messages need to be very different from CVS/svn
ones.

> The actual performance of git itself isn't the largest problem.
> The migration issues, esp. the speed of the conversion are.
>
> My status of the migration side itself hasn't changed since the end of
> October:
> http://archives.gentoo.org/gentoo-scm/msg_e0a0a41200c1fc6a0fda68b4ff9d2c61.xml
>
> That top item is the largest blocker. The actual conversion time is down
> to 9 hours, but with more than that again in setting it up. I'd like to
> get the conversion time down to UNDER 4 hours. It's mostly
> single-threaded, and we've got lots of cores available, it just needs
> parallelization. We're basically dead in the water during the
> conversion, there is NO incremental support at all.
>

Actually, this has confused me for a while. Sorry if this is a dumb
question, but why do we care about the conversion speed if we can just
convert it once, make the old cvs repo read-only and be done with it?
Are we concerned about the window during which cvs access will have to
be blocked and devs will sit around twiddling thumbs?

> Side-project to the above: Is there anything link Psyco for Python
> acceleration that works on 64-bit machines? Psyco itself has a warning
> on the frontpage of no 64-bit support.
>

I had written Pysco off as dead and started looking forward to Unladen
Swallow

http://code.google.com/p/unladen-swallow/wiki/ProjectPlan

> We DO still need somebody that cares about CVS access to test with
> git-cvssserver against the existing conversion.
>

This will fit in quite badly with the proposed changes to make
Manifest have only distfile manifests when using git, and to not have
ChangeLogs for the simple reason that they (Manifests and ChangeLogs)
invariably cause merge conflicts. And then there was also the plan to
not edit headers for files to prevent extra commits (which are even
more useless in git).


--
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
 

Thread Tools




All times are GMT. The time now is 08:45 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org