On 9/12/2012 5:58 AM, Ian Stakenvicius wrote:
-----BEGIN PGP SIGNED MESSAGE-----
On 12/09/12 05:55 AM, Gregory M. Turner wrote:
Note that, effectively, we have this already, and it's called
"portage". But one could certainly make a case for modularizing it
better, since, in truth, we are talking about a very common, very
abstract problem here which portage shares with any number of
Such an engine could very well do exactly the right thing if it
were faced with a constraint that a certain part of a certain build
needed to proceed without parallelism due to limitations coming
from the build.
Also, there are very large parts of most builds -- configure comes
to mind -- that don't parallelize even if, perhaps, they should.
In such cases, a really smart global parallelism arbiter could
easily respond by spawning more jobs from other builds.
So essentially what you're saying here is that it might be worthwhile
to look into parallelism as a whole and possibly come up with a
solution that combines 'emerge --jobs' and build-system parallelism
together to maximum benefit?
Yeah, couldn't have said it better myself ... apparently
Advanced HPC systems (sys-cluster/torque along with an appropriate
scheduler, for instance) can do such things with their jobs when the
jobs are properly built; I could see portage being able to handle this
as well given most of what is necessary is already known (ebuild
phases, build system type (via eclass), etc). However, given the
limitations already put on parallelism in terms of emerge order, etc,
I could see this solution needing to be -very- complex and integration
needing to occur on multiple levels. We'd also need to consider
distcc (and other cluster-shared compilation methods if there are
any??).. It would be an interesting project, though.
ACK all of the above.
Tempting to think more deeply about this but probably the last thing I
need to do right now is to talk myself into another speculative project.
I've hurt my wrist a bit -- probably an RSI -- so should help deter me :S
Only a few major sources of parallelism exist in portage: --jobs /
--load-average in emerge opts, multiprocessing eclass & equiv. ebuild
helper, distcc, and make... Infrastructure is already in place for all
of those, so perhaps a good holistic solution exists that isn't /too/
...OK another f!#!%$^ brainstorm incoming
For "JOBS" syntax... what really seems missing in portage are:
o a clean way to say "dont parallelize this particular make
invocation" in ebuilds
o a clean way to globally say "try to use this parallelization
strategy when emerging."
So what about something like:
o EMERGE_JOBS and EMERGE_LOAD_AVERAGE make.conf vars equiv. to
--jobs and --load-average emerge options
o EBUILD_JOBS and EBUILD_LOAD_AVERAGE make.conf vars
o If the latter are not specified, they are copied respectively from
the former (debatable for *_JOBS, since now we get 16 processes when
we asked for four).
o MAKEOPTS is auto-extended to reflect EBUILD_JOBS/EBUILD_LOAD_AVERAGE
if & only if -j|--jobs|-l|--load-average options aren't provided in
o however, if MAKEOPTS "override" EBUILD_JOBS or EBUILD_LOAD_AVERAGE,
issue a conspicuous yellow-stars warning
o extend "emake" to accept a "--non-parallel" option which will
strip all -j|--jobs|-l|--load-average options from MAKEOPTS;
perhaps support an equivalent EBUILD_NON_PARALLEL envvar as well,
with support for override in profile.bashrc. Don't warn about this
overriding EBUILD_JOBS -- treat as SOP.
o debatable: respect EBUILD_NON_PARALLEL in multiprocessing, etc?
or, perhaps, something like:
could be used to distinguish between "don't use any parallelism"
and "don't use GNU's make parallelism in emake". Also maybe a
better name exists that doesn't use double-negatives.
Seems to me something vaguely like the above would provide
o backward compatibility for ebuilds and make.conf
o not so vastly different than what we have
o a decent way to specify what "we really want" globally;
insofar as portage doesn't do the best job effecting the requested
parallelization strategy, more ambitious tactics could be
implemented later, hopefully without huge interface revisions.
(Kind-of-crazy additional idea: put ceil(sqrt(EMERGE_JOBS)) into
EBUILD_JOBS when only the former is specified, and then let
effective_emerge_jobs equal floor(EMERGE_JOBS/EBUILD_JOBS).... but maybe
too much automagic for this to be a good idea.)