Cronjob for regular git garbage collection
On Tue, Nov 3, 2009 at 7:23 AM, Thomas Bächler <email@example.com> wrote:
> Dan McGee schrieb:
>> Realize that this has drawbacks; someone that is fetching (not
>> cloning) over HTTP will have to redownload the whole pack again and
>> not just the incremental changeset. You may want something more like
>> the included script as it gives you the benefits of compressing
>> objects but not creating one huge pack.
>> $ cat bin/prunerepos
>> for dir in $(ls | grep -F '.git'); do
>> * * * *cd $cwd/$dir
>> * * * *echo "pruning and packing $cwd/$dir..."
>> * * * *git prune
>> * * * *git repack -d
> I realize that, is it something we should be really concerned about? With
> our small repositories, the overhead of downloading a bunch of small files
> might even outweigh the size of a big pack.
That is the whole point, repack doesn't create small files, it bundles
them up for you. Downloading 3 packs is still quicker than downloading
1 big one if we do it once a week. The AUR pack is quite huge and that
is under active development, so I would feel bad gc-ing that when a
simple repack (I just did one) will do creating only a 230K pack:
$ ll objects/pack/
-r--r--r-- 1 simo aur-git 22K 2009-11-03 08:28
-r--r--r-- 1 simo aur-git 230K 2009-11-03 08:28
-r--r--r-- 1 simo aur-git 139K 2009-01-22 21:38
-r--r--r-- 1 simo aur-git 8.3M 2009-01-22 21:38
And if it is still a problem we can always just switch to git-gc
later- we don't need to skip this intermediate step.
> pacman.git is our biggest and currently has a 5.4MB pack when you gc it.
Note that this is an incredibly compacted initial pack- the repository
will weigh in around 9 MB if you packed it locally; I had to pull some
tricks to get it that small.
> Or maybe we should prune && repack them weekly, but gc them monthly or every
> 2 months?
> Last week, we had http access to http://projects.archlinux.org/git/ (not
> counting 403s and 404s) from 12 different IPs, 66 the week before that, then
> 63 and 84. I hope most people use git://.
I also hope most people use git; but I don't want to leave those in
the dust that can't. They are also likely the ones with the worst
internet connections so watching out for them might be the nice thing