FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Development

 
 
LinkBack Thread Tools
 
Old 03-27-2011, 05:07 PM
Joerg Jaspert
 
Default GIT for pdiff generation

>> As we are no git gurus ourself: Does anyone out there see any trouble
>> doing it this way? It means storing something around 7GB of
>> uncompressed text files in git, plus the daily changes happening to
>> them, then diffing them in the way described above, however the
>> archive will only need to go back for a couple of weeks and therefore
>> we should be able to apply git gc --prune (coupled with whatever way
>> to actually tell git that everything before $DATE can be removed) to
>> keep the size down.
> AFAIK, there can be trouble. It all depends on how you're structuring
> the data in git, and the size of the largest data object you will want
> to commit to the repository.

Right now the source contents of unstable has, unpacked, 220MB. (Packed
gzip its 28MB, while the binary contents per have each have 18MB
packed).

Lets add a safety margin: 350MB is a good guess for the largest.
A packages file nearly doesnt count compared to them, unpacked its just
some 34mb

> There is an alternative: git can rewrite the entire history
> (invalidating all commit IDs from the start pointing up to all the
> branch heads in the process). You can use that facility to drop old
> commits. Given the indented use, where you don't seem to need the
> commit ids to be constant across runs and you will rewrite the history
> of the entire repo at once and drop everything that was not rewritten,
> this is likely the less ugly way of doing what you want. Refer to git
> filter-branch.

Its the one and only thing I ever seen where "history rewrite" is
actually something one wants to do.

> Other than that, git loads entire objects to memory to manipulate them,
> which AFAIK CAN cause problems in datasets with very large files (the
> problem is not usually the size of the repository, but rather the size
> of the largest object). You probably want to test your use case with
> several worst-case files AND a large safety margin to ensure it won't
> break on us anytime soon, using something to track git memory usage.

Well, yes.

--
bye, Joerg
Some NM:
> FTBFS=Fails to Build from Start
Err, yes? How do you start in the middle?


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87lj00a3vq.fsf@gkar.ganneff.de">http://lists.debian.org/87lj00a3vq.fsf@gkar.ganneff.de
 
Old 03-27-2011, 07:27 PM
Henrique de Moraes Holschuh
 
Default GIT for pdiff generation

On Sun, 27 Mar 2011, Joerg Jaspert wrote:
> Right now the source contents of unstable has, unpacked, 220MB. (Packed
> gzip its 28MB, while the binary contents per have each have 18MB
> packed).

That should not be a problem in any non-joke box. Unless you'll run it
in a memory-constrained vm or something.

> Lets add a safety margin: 350MB is a good guess for the largest.
> A packages file nearly doesnt count compared to them, unpacked its just
> some 34mb

I.e. something very easy to keep in RAM on a "server class" or "desktop
class" box.

> > There is an alternative: git can rewrite the entire history
> > (invalidating all commit IDs from the start pointing up to all the
> > branch heads in the process). You can use that facility to drop old
> > commits. Given the indented use, where you don't seem to need the
> > commit ids to be constant across runs and you will rewrite the history
> > of the entire repo at once and drop everything that was not rewritten,
> > this is likely the less ugly way of doing what you want. Refer to git
> > filter-branch.
>
> Its the one and only thing I ever seen where "history rewrite" is
> actually something one wants to do.

Indeed.

> > Other than that, git loads entire objects to memory to manipulate them,
> > which AFAIK CAN cause problems in datasets with very large files (the
> > problem is not usually the size of the repository, but rather the size
> > of the largest object). You probably want to test your use case with
> > several worst-case files AND a large safety margin to ensure it won't
> > break on us anytime soon, using something to track git memory usage.
>
> Well, yes.

At the sizes you explained now (I thought it would deal with objects 7GB
in size, not 7GB worth of objects at most 0.5GB in size), it should not
be a problem in any box with a reasonable ammount of free RAM and vm
space (say, 1GB).

> Some NM:
> > FTBFS=Fails to Build from Start
> Err, yes? How do you start in the middle?

You screw up debian/rules clean, and try two builds in sequence ;-)

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110327192759.GA4173@khazad-dum.debian.net">http://lists.debian.org/20110327192759.GA4173@khazad-dum.debian.net
 
Old 03-28-2011, 06:45 AM
Joerg Jaspert
 
Default GIT for pdiff generation

>> Right now the source contents of unstable has, unpacked, 220MB. (Packed
>> gzip its 28MB, while the binary contents per have each have 18MB
>> packed).
> That should not be a problem in any non-joke box. Unless you'll run it
> in a memory-constrained vm or something.

Well. For our archives it is turned on in the main and backports one. I
dont think main will ever run in trouble there:
total used free shared buffers cached
Mem: 33006584 29241780 3764804 0 2343936 20783680

while backports isnt as big but still large enough:
total used free shared buffers cached
Mem: 8198084 7352164 845920 0 1063012 5650672

>> Lets add a safety margin: 350MB is a good guess for the largest.
>> A packages file nearly doesnt count compared to them, unpacked its just
>> some 34mb
> I.e. something very easy to keep in RAM on a "server class" or "desktop
> class" box.

Yes.

>> > Other than that, git loads entire objects to memory to manipulate them,
>> > which AFAIK CAN cause problems in datasets with very large files (the
>> > problem is not usually the size of the repository, but rather the size
>> > of the largest object). You probably want to test your use case with
>> > several worst-case files AND a large safety margin to ensure it won't
>> > break on us anytime soon, using something to track git memory usage.
>> Well, yes.
> At the sizes you explained now (I thought it would deal with objects 7GB
> in size, not 7GB worth of objects at most 0.5GB in size), it should not
> be a problem in any box with a reasonable ammount of free RAM and vm
> space (say, 1GB).

Right, could have written that better.

--
bye, Joerg
<liw> I'm a blabbermouth


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87fwq73fqh.fsf@gkar.ganneff.de">http://lists.debian.org/87fwq73fqh.fsf@gkar.ganneff.de
 

Thread Tools




All times are GMT. The time now is 07:56 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org