FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux Development

 
 
LinkBack Thread Tools
 
Old 11-03-2009, 09:55 AM
"Daniel Isenmann"
 
Default Cronjob for regular git garbage collection

> When I broke our projects.archlinux.org vhost, I noticed that cloning
> git via http:// takes ages. This could be vastly improved by running a
> regular cronjob to 'git gc' all /srv/projects/git repositories. It would
> also speed up cloning/pulling via git://, as the "remote: compressing
> objects" stage will be much less work on the server. Are there any
> objections against setting this up?

I'm not so familiar with git internals on the git server, but it sounds reasonable to me what you said. Even the documentation of "git gc" says:

"Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance."

So, I say +1 from my (not-so-familiar-git) side.

Daniel
--
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
 
Old 11-03-2009, 11:59 AM
Dan McGee
 
Default Cronjob for regular git garbage collection

On Tue, Nov 3, 2009 at 3:49 AM, Thomas Bächler <thomas@archlinux.org> wrote:
> When I broke our projects.archlinux.org vhost, I noticed that cloning git
> via http:// takes ages. This could be vastly improved by running a regular
> cronjob to 'git gc' all /srv/projects/git repositories. It would also speed
> up cloning/pulling via git://, as the "remote: compressing objects" stage
> will be much less work on the server. Are there any objections against
> setting this up?

I used to do this fairly often on the pacman.git repo; I did a few of
the others as well. No objections here, just make sure running the
cronjob doesn't make the repository unwritable for the people that
need it.

Realize that this has drawbacks; someone that is fetching (not
cloning) over HTTP will have to redownload the whole pack again and
not just the incremental changeset. You may want something more like
the included script as it gives you the benefits of compressing
objects but not creating one huge pack.

-Dan

$ cat bin/prunerepos
#!/bin/sh

cwd=$(pwd)

for dir in $(ls | grep -F '.git'); do
cd $cwd/$dir
echo "pruning and packing $cwd/$dir..."
git prune
git repack -d
done
 
Old 11-03-2009, 12:23 PM
Thomas Bächler
 
Default Cronjob for regular git garbage collection

Dan McGee schrieb:

Realize that this has drawbacks; someone that is fetching (not
cloning) over HTTP will have to redownload the whole pack again and
not just the incremental changeset. You may want something more like
the included script as it gives you the benefits of compressing
objects but not creating one huge pack.

-Dan

$ cat bin/prunerepos
#!/bin/sh

cwd=$(pwd)

for dir in $(ls | grep -F '.git'); do
cd $cwd/$dir
echo "pruning and packing $cwd/$dir..."
git prune
git repack -d
done


I realize that, is it something we should be really concerned about?
With our small repositories, the overhead of downloading a bunch of
small files might even outweigh the size of a big pack.


pacman.git is our biggest and currently has a 5.4MB pack when you gc it.

Or maybe we should prune && repack them weekly, but gc them monthly or
every 2 months?


Last week, we had http access to http://projects.archlinux.org/git/ (not
counting 403s and 404s) from 12 different IPs, 66 the week before that,
then 63 and 84. I hope most people use git://.
 
Old 11-03-2009, 12:30 PM
Dan McGee
 
Default Cronjob for regular git garbage collection

On Tue, Nov 3, 2009 at 7:23 AM, Thomas Bächler <thomas@archlinux.org> wrote:
> Dan McGee schrieb:
>>
>> Realize that this has drawbacks; someone that is fetching (not
>> cloning) over HTTP will have to redownload the whole pack again and
>> not just the incremental changeset. You may want something more like
>> the included script as it gives you the benefits of compressing
>> objects but not creating one huge pack.
>>
>> -Dan
>>
>> $ cat bin/prunerepos
>> #!/bin/sh
>>
>> cwd=$(pwd)
>>
>> for dir in $(ls | grep -F '.git'); do
>> * * * *cd $cwd/$dir
>> * * * *echo "pruning and packing $cwd/$dir..."
>> * * * *git prune
>> * * * *git repack -d
>> done
>
> I realize that, is it something we should be really concerned about? With
> our small repositories, the overhead of downloading a bunch of small files
> might even outweigh the size of a big pack.

That is the whole point, repack doesn't create small files, it bundles
them up for you. Downloading 3 packs is still quicker than downloading
1 big one if we do it once a week. The AUR pack is quite huge and that
is under active development, so I would feel bad gc-ing that when a
simple repack (I just did one) will do creating only a 230K pack:
$ ll objects/pack/
total 8.7M
-r--r--r-- 1 simo aur-git 22K 2009-11-03 08:28
pack-2def16dc5d8361b8a7c11e60e10c503ba9874fdb.idx
-r--r--r-- 1 simo aur-git 230K 2009-11-03 08:28
pack-2def16dc5d8361b8a7c11e60e10c503ba9874fdb.pack
-r--r--r-- 1 simo aur-git 139K 2009-01-22 21:38
pack-c7bd96b6fc392799991ad88824f935c09d470efa.idx
-r--r--r-- 1 simo aur-git 8.3M 2009-01-22 21:38
pack-c7bd96b6fc392799991ad88824f935c09d470efa.pack

And if it is still a problem we can always just switch to git-gc
later- we don't need to skip this intermediate step.

> pacman.git is our biggest and currently has a 5.4MB pack when you gc it.

Note that this is an incredibly compacted initial pack- the repository
will weigh in around 9 MB if you packed it locally; I had to pull some
tricks to get it that small.

> Or maybe we should prune && repack them weekly, but gc them monthly or every
> 2 months?
>
> Last week, we had http access to http://projects.archlinux.org/git/ (not
> counting 403s and 404s) from 12 different IPs, 66 the week before that, then
> 63 and 84. I hope most people use git://.

I also hope most people use git; but I don't want to leave those in
the dust that can't. They are also likely the ones with the worst
internet connections so watching out for them might be the nice thing
to do.
 
Old 11-03-2009, 01:28 PM
Thomas Bächler
 
Default Cronjob for regular git garbage collection

Dan McGee schrieb:

That is the whole point, repack doesn't create small files, it bundles
them up for you. Downloading 3 packs is still quicker than downloading
1 big one if we do it once a week.


I just read the help of repack -d and it totally makes sense to use it
this way. We could generate weekly packs then. Is there also an option
to repack these weekly packs into one big pack once they're older than 6
months or so?



pacman.git is our biggest and currently has a 5.4MB pack when you gc it.


Note that this is an incredibly compacted initial pack- the repository
will weigh in around 9 MB if you packed it locally; I had to pull some
tricks to get it that small.


I don't understand. What did you do to it? I just ran "git gc" locally
on it and it had that size.



I also hope most people use git; but I don't want to leave those in
the dust that can't. They are also likely the ones with the worst
internet connections so watching out for them might be the nice thing
to do.


Agreed.
 

Thread Tools




All times are GMT. The time now is 01:44 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org