FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 09-12-2012, 03:18 PM
"vivo75@gmail.com"
 
Default EJOBS variable for EAPI 5?

Il 11/09/2012 18:43, Zac Medico ha scritto:

On 09/11/2012 09:36 AM, vivo75@gmail.com wrote:

Dunno where to place this request, but if we go for something like EJOBS
could we also make it phase specific?
So compile, install and test could have a different number of jobs running.
Possibly three different variables that override a predefined EJOBS.

Per-phase sounds a little to fine-grained. Instead, I'd suggest to add
an ELOADAVG variable that's analogous to make's --load-average option.
That should be enough to compensate for any differences between phases.
ok, but in my experience load-average really is too limited so I
relaunch with the ability to control the following:

- disk io
- network
- memory
- cpu
- jobs

just tough that being able to control just jobs in a phase specific
manner could have been sufficed ;-)
Also this seem is a good job for containers, already implemented in the
linux kernel, but will let someone with experience with them comment on
the mattter.
 
Old 09-12-2012, 04:33 PM
Hans de Graaff
 
Default EJOBS variable for EAPI 5?

On Wed, 2012-09-12 at 08:58 -0400, Ian Stakenvicius wrote:

> So essentially what you're saying here is that it might be worthwhile
> to look into parallelism as a whole and possibly come up with a
> solution that combines 'emerge --jobs' and build-system parallelism
> together to maximum benefit?

Forget about jobs and load average, and just keep starting jobs all
around until there is only 20% (or whatever tuneable amount) free memory
left. As far as I can tell this is always the real bottleneck in the
end. Once you hit swap overall throughput has to go down quite a bit.

Hans
 
Old 09-12-2012, 04:48 PM
Michael Mol
 
Default EJOBS variable for EAPI 5?

On Wed, Sep 12, 2012 at 12:33 PM, Hans de Graaff <graaff@gentoo.org> wrote:

On Wed, 2012-09-12 at 08:58 -0400, Ian Stakenvicius wrote:



> So essentially what you're saying here is that it might be worthwhile

> to look into parallelism as a whole and possibly come up with a

> solution that combines 'emerge --jobs' and build-system parallelism

> together to maximum benefit?



Forget about jobs and load average, and just keep starting jobs all

around until there is only 20% (or whatever tuneable amount) free memory

left. As far as I can tell this is always the real bottleneck in the

end. Once you hit swap overall throughput has to go down quite a bit.


I've been thinking about this, but that only works until you get to the huge link step of, e.g. chromium, firefox, libreoffice.

I've had programs with memory leaks in the past, but I've never seen a program validly consume as much memory as ld during those builds.
To cover something like that, you would need to be able to freeze and swap out an entire process (such as ld) to allow something else to complete quickly...but there's no good way I can think of to prioritize sanely between the one big process and the few dozen smaller ones which might be allowed to spawn and complete first.

--
:wq
 
Old 09-12-2012, 04:58 PM
Zac Medico
 
Default EJOBS variable for EAPI 5?

On 09/12/2012 09:33 AM, Hans de Graaff wrote:
> On Wed, 2012-09-12 at 08:58 -0400, Ian Stakenvicius wrote:
>
>> So essentially what you're saying here is that it might be worthwhile
>> to look into parallelism as a whole and possibly come up with a
>> solution that combines 'emerge --jobs' and build-system parallelism
>> together to maximum benefit?
>
> Forget about jobs and load average, and just keep starting jobs all
> around until there is only 20% (or whatever tuneable amount) free memory
> left. As far as I can tell this is always the real bottleneck in the
> end. Once you hit swap overall throughput has to go down quite a bit.

Well, I think it's still good to limit the number of jobs at least,
since otherwise you could become overloaded with processes that don't
consume a lot of memory at first but by the time they complete they have
consumed much more memory than desired (using swap).
--
Thanks,
Zac
 
Old 09-12-2012, 05:03 PM
Ian Stakenvicius
 
Default EJOBS variable for EAPI 5?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 12/09/12 12:58 PM, Zac Medico wrote:
> On 09/12/2012 09:33 AM, Hans de Graaff wrote:
>> On Wed, 2012-09-12 at 08:58 -0400, Ian Stakenvicius wrote:
>>
>>> So essentially what you're saying here is that it might be
>>> worthwhile to look into parallelism as a whole and possibly
>>> come up with a solution that combines 'emerge --jobs' and
>>> build-system parallelism together to maximum benefit?
>>
>> Forget about jobs and load average, and just keep starting jobs
>> all around until there is only 20% (or whatever tuneable amount)
>> free memory left. As far as I can tell this is always the real
>> bottleneck in the end. Once you hit swap overall throughput has
>> to go down quite a bit.
>
> Well, I think it's still good to limit the number of jobs at
> least, since otherwise you could become overloaded with processes
> that don't consume a lot of memory at first but by the time they
> complete they have consumed much more memory than desired (using
> swap).

I think this would need to be dealt with by having the parent emerge
process monitor all children and specifically block individual
processes (ie, 'make' , 'ld' , etc) once resources are unavailable
until they become so. Swap may be hit by the big processes but they
wouldn't continue to be processed while in swap, at least.

I don't have a solution to the potential 'thrashing' issue, though,
which i expect would be a problem even if there's enough memory.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iF4EAREIAAYFAlBQwFoACgkQ2ugaI38ACPAiAwD/foU8Xw1BQM3jeO6IiVfCGOnw
xHtIufwVmMpsGVdJQRIA/3W7Utg92foSc6ZtKMzBP5Fj0qB2BXMt/RS2I4pLsCQm
=gy9K
-----END PGP SIGNATURE-----
 
Old 09-12-2012, 06:52 PM
"Gregory M. Turner"
 
Default EJOBS variable for EAPI 5?

On 9/12/2012 5:58 AM, Ian Stakenvicius wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 12/09/12 05:55 AM, Gregory M. Turner wrote:


Note that, effectively, we have this already, and it's called
"portage". But one could certainly make a case for modularizing it
better, since, in truth, we are talking about a very common, very
abstract problem here which portage shares with any number of
batch-build systems.

Such an engine could very well do exactly the right thing if it
were faced with a constraint that a certain part of a certain build
needed to proceed without parallelism due to limitations coming
from the build.

Also, there are very large parts of most builds -- configure comes
to mind -- that don't parallelize even if, perhaps, they should.
In such cases, a really smart global parallelism arbiter could
easily respond by spawning more jobs from other builds.



So essentially what you're saying here is that it might be worthwhile
to look into parallelism as a whole and possibly come up with a
solution that combines 'emerge --jobs' and build-system parallelism
together to maximum benefit?


Yeah, couldn't have said it better myself ... apparently


Advanced HPC systems (sys-cluster/torque along with an appropriate
scheduler, for instance) can do such things with their jobs when the
jobs are properly built; I could see portage being able to handle this
as well given most of what is necessary is already known (ebuild
phases, build system type (via eclass), etc). However, given the
limitations already put on parallelism in terms of emerge order, etc,
I could see this solution needing to be -very- complex and integration
needing to occur on multiple levels. We'd also need to consider
distcc (and other cluster-shared compilation methods if there are
any??).. It would be an interesting project, though.


ACK all of the above.

Tempting to think more deeply about this but probably the last thing I
need to do right now is to talk myself into another speculative project.


I've hurt my wrist a bit -- probably an RSI -- so should help deter me :S

Only a few major sources of parallelism exist in portage: --jobs /
--load-average in emerge opts, multiprocessing eclass & equiv. ebuild
helper, distcc, and make... Infrastructure is already in place for all
of those, so perhaps a good holistic solution exists that isn't /too/
complicated.


...OK another f!#!%$^ brainstorm incoming

For "JOBS" syntax... what really seems missing in portage are:

o a clean way to say "dont parallelize this particular make
invocation" in ebuilds

o a clean way to globally say "try to use this parallelization
strategy when emerging."

So what about something like:

o EMERGE_JOBS and EMERGE_LOAD_AVERAGE make.conf vars equiv. to
--jobs and --load-average emerge options

o EBUILD_JOBS and EBUILD_LOAD_AVERAGE make.conf vars

o If the latter are not specified, they are copied respectively from
the former (debatable for *_JOBS, since now we get 16 processes when
we asked for four).

o MAKEOPTS is auto-extended to reflect EBUILD_JOBS/EBUILD_LOAD_AVERAGE
if & only if -j|--jobs|-l|--load-average options aren't provided in
make.conf/profile/envvar MAKEOPTS

o however, if MAKEOPTS "override" EBUILD_JOBS or EBUILD_LOAD_AVERAGE,
issue a conspicuous yellow-stars warning

o extend "emake" to accept a "--non-parallel" option which will
strip all -j|--jobs|-l|--load-average options from MAKEOPTS;
perhaps support an equivalent EBUILD_NON_PARALLEL envvar as well,
with support for override in profile.bashrc. Don't warn about this
overriding EBUILD_JOBS -- treat as SOP.

o debatable: respect EBUILD_NON_PARALLEL in multiprocessing, etc?
or, perhaps, something like:

EMAKE_NON_PARALLEL=${EMAKE_NON_PARALLEL:-${EBUILD_NON_PARALLEL:-no}}

could be used to distinguish between "don't use any parallelism"
and "don't use GNU's make parallelism in emake". Also maybe a
better name exists that doesn't use double-negatives.

?

Seems to me something vaguely like the above would provide

o backward compatibility for ebuilds and make.conf

o not so vastly different than what we have

o a decent way to specify what "we really want" globally;
insofar as portage doesn't do the best job effecting the requested
parallelization strategy, more ambitious tactics could be
implemented later, hopefully without huge interface revisions.

-gmt

P.S.:

(Kind-of-crazy additional idea: put ceil(sqrt(EMERGE_JOBS)) into
EBUILD_JOBS when only the former is specified, and then let
effective_emerge_jobs equal floor(EMERGE_JOBS/EBUILD_JOBS).... but maybe
too much automagic for this to be a good idea.)
 

Thread Tools




All times are GMT. The time now is 12:14 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org