Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Development (http://www.linux-archive.org/debian-development/)
-   -   wanna-build / how to sort packages on buildds? (http://www.linux-archive.org/debian-development/520526-wanna-build-how-sort-packages-buildds.html)

Andreas Barth 04-30-2011 11:36 PM

wanna-build / how to sort packages on buildds?
 
Hi,

I have a problem I need to solve in perl within wanna-build:

Sometimes we have a few packages we don't want to build on a certain
buildds. Sometimes this is because this package needs lots of ram. Or
it takes quite long and would waste the parallel building a machine
supports. Or whatever else. Of course a package could be in more than
one category.

Now, what I would like to do is to write that down in a central file
with categories.

That is, to mark packages as "builds only with more than one gigabyte
of ram". And to mark buildds as "has 6 cores", "only ... ram" - so
that I don't need to copy entries from buildd to buildd, but just say
"that new machine is the same class as ...", and that's it.

Now my question is just: How to do that efficient? I.e. how would such
a configuration file look like, and how the code to distribute the
package on the most fitting buildd(s)? (I.e. it's better to waste 5
out of 6 cores than to not build a package at all, but a package
needing at least 1g ram can't build on a buildd with only 512mb - but
no package should starve in the end.)

Ideas? Suggestions? Code?



Andi


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110430233638.GZ15003@mails.so.argh.org">http://lists.debian.org/20110430233638.GZ15003@mails.so.argh.org

Tollef Fog Heen 05-01-2011 06:40 AM

wanna-build / how to sort packages on buildds?
 
]] Andreas Barth

Hi,

| Now my question is just: How to do that efficient? I.e. how would such
| a configuration file look like, and how the code to distribute the
| package on the most fitting buildd(s)? (I.e. it's better to waste 5
| out of 6 cores than to not build a package at all, but a package
| needing at least 1g ram can't build on a buildd with only 512mb - but
| no package should starve in the end.)
|
| Ideas? Suggestions? Code?

Sounds like a variant of the knapsack problem.

I'd suggest something like:

- Have a mapping for buildds from resources to a value (this can just be
a perl hash), this defines cores, amount of memory, etc.

- Each package has a minimum requirement for cpu, memory, etc, stored in a
hash. Store all the packages in a list.

- Sort the list, either according to a score which is a mix of cpu and
memory and whatever other factors you want or first along the cpu
axis, then along the memory axis, etc. I suspect CPU and memory
requirements are correlated, but not perfectly.

- Assign packages to buildds on a first-match basis. That means you get
the hardest packages done first. The match has to make sure the
buildd can actually build the package in question, of course.

Regards,
--
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87d3k2c49x.fsf@qurzaw.varnish-software.com">http://lists.debian.org/87d3k2c49x.fsf@qurzaw.varnish-software.com

Ingo Jürgensmann 05-01-2011 09:24 AM

wanna-build / how to sort packages on buildds?
 
On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote:


Sometimes we have a few packages we don't want to build on a certain
buildds. Sometimes this is because this package needs lots of ram. Or
it takes quite long and would waste the parallel building a machine
supports. Or whatever else. Of course a package could be in more than
one category.


Yes, you're facing basically the same problem I tried to address in
2000/2001 when doing my renderserver and later for what Multibuild was
intended to do as well. ;-)



Now, what I would like to do is to write that down in a central file
with categories.


I would recommend to use a database, really.


That is, to mark packages as "builds only with more than one gigabyte
of ram". And to mark buildds as "has 6 cores", "only ... ram" - so
that I don't need to copy entries from buildd to buildd, but just say
"that new machine is the same class as ...", and that's it.


Another category would be "fast disk/raid". There are some packages
with lots of disk accesses. When you can schedule those packages to a
buildd that has faster disk access like in having multiple spindles for
faster seeks, you can minimize build times as well. We faced that
problem on m68k particularly on IDE vs SCSI disks on Amigas, as IDE was
dog slow. Another example there was the faster disks on Amigas vs slower
SCSI disks in Apple machines.


Now my question is just: How to do that efficient? I.e. how would
such

a configuration file look like, and how the code to distribute the
package on the most fitting buildd(s)? (I.e. it's better to waste 5
out of 6 cores than to not build a package at all, but a package
needing at least 1g ram can't build on a buildd with only 512mb - but
no package should starve in the end.)
Ideas? Suggestions? Code?


Look at my update-buildd.net from Buildd.net, which I used to collect
data from the buildds such as RAM, kernel, uptime, used swap and such
(http://buildd.net/cgi/hostpackages.cgi?unstable_arch=m68k&searchtype=arr akis).
I store this information into the database and also the build times of
the packages. With this dataset it should be possible to have the
wanna-buildd schedule packages in such a way to minimize the build times
because you can decide which buildd is the most suitable buildd for the
next package.


--
Ciao... // Fon: 0381-2744150
Ingo X/ http://blog.windfluechter.net

gpg pubkey: http://www.juergensmann.de/ij_public_key.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: f423f24d7f17a3abe30510a870357b18@muaddib.hro.local net">http://lists.debian.org/f423f24d7f17a3abe30510a870357b18@muaddib.hro.local net

Roger Leigh 05-01-2011 10:02 AM

wanna-build / how to sort packages on buildds?
 
On Sun, May 01, 2011 at 01:36:38AM +0200, Andreas Barth wrote:
> I have a problem I need to solve in perl within wanna-build:
>
> Sometimes we have a few packages we don't want to build on a certain
> buildds. Sometimes this is because this package needs lots of ram. Or
> it takes quite long and would waste the parallel building a machine
> supports. Or whatever else. Of course a package could be in more than
> one category.
>
> Now, what I would like to do is to write that down in a central file
> with categories.

I would have to echo the sentiment that storing this information in
the database is probably a better idea.

I just wanted to add that if you would like more statistics reporting
for this purpose, I'll be happy to add that to sbuild. Currently we
only really report build time and disc space. If you want additional
data such as number of cores used, memory/swap usage and other resource
usage, I'll be happy to add them to the sbuild summary stats. Actually
measuring those might be a bit trickier though, especially on machines
running parallel builds.


Regards,
Roger

--
.'`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

Andreas Barth 05-01-2011 10:36 AM

wanna-build / how to sort packages on buildds?
 
* Ingo Jürgensmann (ij@2011.bluespice.org) [110501 11:55]:
> On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote:

>> Now, what I would like to do is to write that down in a central file
>> with categories.
>
> I would recommend to use a database, really.

Sorry, but that's not at all the answer to *this* part of the
question. This question is "how would an normalized view of the data
look like?". (How to store attributes is for another question, but
that's later.)


Andi


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110501103636.GM2657@mails.so.argh.org">http://lists.debian.org/20110501103636.GM2657@mails.so.argh.org

Andreas Barth 05-01-2011 10:46 AM

wanna-build / how to sort packages on buildds?
 
* Roger Leigh (rleigh@codelibre.net) [110501 12:02]:
> I just wanted to add that if you would like more statistics reporting
> for this purpose, I'll be happy to add that to sbuild.

I only worry about the ~20-40 packages that are currently sitting in
some no_auto_build on the buildds. Not more but also not less.

I could easily write a file with
buildd-name: "gcc-4.5", "gcc-snapshot", "gmic", "imagemagick",
"qt4-x11", "ghc", # at least 1g
"more packages", # fpu-emulation is too slow

but I consider that too ugly.


> Currently we
> only really report build time and disc space. If you want additional
> data such as number of cores used, memory/swap usage and other resource
> usage, I'll be happy to add them to the sbuild summary stats. Actually
> measuring those might be a bit trickier though, especially on machines
> running parallel builds.

Thanks for the offer.



Andi


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110501104631.GC15003@mails.so.argh.org">http://lists.debian.org/20110501104631.GC15003@mails.so.argh.org

Goswin von Brederlow 05-02-2011 05:01 PM

wanna-build / how to sort packages on buildds?
 
Ingo Jürgensmann <ij@2011.bluespice.org> writes:

> On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote:
>
>> Sometimes we have a few packages we don't want to build on a certain
>> buildds. Sometimes this is because this package needs lots of ram. Or
>> it takes quite long and would waste the parallel building a machine
>> supports. Or whatever else. Of course a package could be in more than
>> one category.
>
> Yes, you're facing basically the same problem I tried to address in
> 2000/2001 when doing my renderserver and later for what Multibuild was
> intended to do as well. ;-)
>
>> Now, what I would like to do is to write that down in a central file
>> with categories.
>
> I would recommend to use a database, really.
>
>> That is, to mark packages as "builds only with more than one gigabyte
>> of ram". And to mark buildds as "has 6 cores", "only ... ram" - so
>> that I don't need to copy entries from buildd to buildd, but just say
>> "that new machine is the same class as ...", and that's it.
>
> Another category would be "fast disk/raid". There are some packages
> with lots of disk accesses. When you can schedule those packages to a
> buildd that has faster disk access like in having multiple spindles
> for faster seeks, you can minimize build times as well. We faced that
> problem on m68k particularly on IDE vs SCSI disks on Amigas, as IDE
> was dog slow. Another example there was the faster disks on Amigas vs
> slower SCSI disks in Apple machines.
>
>> Now my question is just: How to do that efficient? I.e. how would
>> such
>> a configuration file look like, and how the code to distribute the
>> package on the most fitting buildd(s)? (I.e. it's better to waste 5
>> out of 6 cores than to not build a package at all, but a package
>> needing at least 1g ram can't build on a buildd with only 512mb - but
>> no package should starve in the end.)
>> Ideas? Suggestions? Code?
>
> Look at my update-buildd.net from Buildd.net, which I used to collect
> data from the buildds such as RAM, kernel, uptime, used swap and such
> (http://buildd.net/cgi/hostpackages.cgi?unstable_arch=m68k&searchtype=arr akis). I
> store this information into the database and also the build times of
> the packages. With this dataset it should be possible to have the
> wanna-buildd schedule packages in such a way to minimize the build
> times because you can decide which buildd is the most suitable buildd
> for the next package.

I think different groups of factors have to be considered:

1) absolute requirements

I think there are only 2 absolute requirements:
- ram size
- disk size
And all buildds currently have enough disk space I think.

In the past we also had some sources that would crash one buildd but not
the other. No way to track that ahead of time though. But it should be
possible to report this to wanna-build.

Absolute requirement are absolute. If a buildd doesn't have the
requirement then wanna-build must never schedule the package to build
there. (Note: The buildd will just give it back with the current setup
so no biggy if wanna-build gets it wrong.)

2) important features

The most relevant feature I think is multiple cores and support of
DEB_BUILD_OPTIONS=parallel=x. This would be an attribute of both the
buildd and the source and one should try to match them. Build sources
which support parallel building preverably on systems with multiple
cores.

The I/O speed and the sources need for it could be another such
feature. But I'm not sure (other than the m68k special case) this is
relevant to such a degree that it makes sense tracking this
specifically.

Important features would be anything we can figure out and point to as
having a major influence on the build speed. And imho this should be
like "N times faster" to warrant the effort to track this for sources.

3) general performance

Buildds are different and build times will differ acordingly. I don't
think this can be properly quanitfied ahead of time and there are many
hidden factors interacting that would be impossible to quantify with
reasonable effort. But I think this can be measured and extrapolated
just fine. Keep a database of build times and do some statistical
analysis to rate the buildd speed in general and for specific
sources. With that you have a good aproximation of the time a source
will need to build on each buildd. Use that as weight when deciding
where to build a source.

Unlike items in 2), which would have to be manually tracked, this would
encompass any and all factors including unknown ones in
approximation. Some care would have to be taken that factors aren't
weighted twice, once from 2) and once here.

The build times for a parallel building source will differ greatly for
single and multi core systems. The difference in weigth this produces
might already be sufficient so that those sources prefer the multi core
systems (after a few versions). So tracking important features manually
might be wasted effort altogether.



My suggestion would be to implement something for 1) and 3) and see how
that goes. Actually 1) could be implemented by setting the build time to
infinity. So the scheduler only has to consider 3). If you then find
that sources have widely different build times on different buildds and
the weigth isn't enough to schedule right only then try to figure out
what makes the difference there.

MfG
Goswin


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87zkn5ujd8.fsf@frosties.localnet">http://lists.debian.org/87zkn5ujd8.fsf@frosties.localnet

Goswin von Brederlow 05-02-2011 05:04 PM

wanna-build / how to sort packages on buildds?
 
Roger Leigh <rleigh@codelibre.net> writes:

> I just wanted to add that if you would like more statistics reporting
> for this purpose, I'll be happy to add that to sbuild. Currently we
> only really report build time and disc space. If you want additional
> data such as number of cores used, memory/swap usage and other resource
> usage, I'll be happy to add them to the sbuild summary stats. Actually
> measuring those might be a bit trickier though, especially on machines
> running parallel builds.

Reporting the cummulative cpu time used and wall clock time should be
helpfull in detecting parallel builds. The more cpu time approached wall
clock * num cores the better it would be to build the package on multi
core systems.

MfG
Goswin


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87vcxtuj8b.fsf@frosties.localnet">http://lists.debian.org/87vcxtuj8b.fsf@frosties.localnet

Scott Kitterman 05-02-2011 05:32 PM

wanna-build / how to sort packages on buildds?
 
On Saturday, April 30, 2011 07:36:38 PM Andreas Barth wrote:
> Hi,
>
> I have a problem I need to solve in perl within wanna-build:
>
> Sometimes we have a few packages we don't want to build on a certain
> buildds. Sometimes this is because this package needs lots of ram. Or
> it takes quite long and would waste the parallel building a machine
> supports. Or whatever else. Of course a package could be in more than
> one category.
>
> Now, what I would like to do is to write that down in a central file
> with categories.
>
> That is, to mark packages as "builds only with more than one gigabyte
> of ram". And to mark buildds as "has 6 cores", "only ... ram" - so
> that I don't need to copy entries from buildd to buildd, but just say
> "that new machine is the same class as ...", and that's it.
>
> Now my question is just: How to do that efficient? I.e. how would such
> a configuration file look like, and how the code to distribute the
> package on the most fitting buildd(s)? (I.e. it's better to waste 5
> out of 6 cores than to not build a package at all, but a package
> needing at least 1g ram can't build on a buildd with only 512mb - but
> no package should starve in the end.)
>
> Ideas? Suggestions? Code?

If one could do something like:

wb gb libieee1284 mod-wsgi nflog-bindings zinnia . ia64 . !caballero

that would be a HUGE win. My suggestion would be to start with something
simple and declarative like that and then build the back end to automatically
sort out the list of candidate buildd's after that.

Scott K


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201105021332.04065.debian@kitterman.com">http://lists.debian.org/201105021332.04065.debian@kitterman.com

Andreas Barth 05-02-2011 05:36 PM

wanna-build / how to sort packages on buildds?
 
* Scott Kitterman (debian@kitterman.com) [110502 19:32]:
> If one could do something like:
>
> wb gb libieee1284 mod-wsgi nflog-bindings zinnia . ia64 . !caballero

good idea. I'll consider how to do that.


Andi


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110502173620.GP15003@mails.so.argh.org">http://lists.debian.org/20110502173620.GP15003@mails.so.argh.org


All times are GMT. The time now is 07:04 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.