FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 08-16-2012, 11:59 AM
Rich Freeman
 
Default Questions about SystemD and OpenRC

On Wed, Aug 15, 2012 at 3:18 PM, Michael Mol <mikemol@gmail.com> wrote:
> It also sounds like something like that could be a benefit to shrinking @system.
>

I think the solution to the circular dependency issue isn't to make
Portage able to completely bootstrap the whole system, but rather just
to make it capable of coping with the issues and knowing when to raise
an alarm.

Gentoo will always involve extracting a tarball/etc for the initial
installation since you always need SOMETHING to start with. You can't
even chroot into your install directory without a shell being there,
and typing "emerge" won't go so well if portage isn't actually
installed.

So, continue to build stages like we do right now - no doubt with
hard-coding and such to get around the dependencies.

As far as objections to listing gcc and such in every ebuild go, why
not? We list all kinds of routine stuff in hundreds of ebuilds so
that we can install systems without them. Why not just have a
toolchain virtual or something?

And since ssh was brought up - this is what happens with hacks like
this. When you combine the "default install" with the "minimum deps
for everything" list you end up with an ssh you can't get rid of
without the package.provided hack (which really should be used for
stuff that is, well, provided).

It would be nice if people who want to build a server with Gentoo but
then reduce it to only true RDEPENDS could do so. Obviously they'd
have to use binary packages to continue to maintain it (and even then
they'd need to keep portage on it), or they'd have to build another
one. Actually, the trend in general is towards disposable servers
anyway so generating an entire new server every time one thing changes
is probably a desirable thing, since you probably want to be able to
do it every time you add a server anyway.

Rich
 
Old 08-16-2012, 07:07 PM
Zac Medico
 
Default Questions about SystemD and OpenRC

On 08/16/2012 04:59 AM, Rich Freeman wrote:
> And since ssh was brought up - this is what happens with hacks like
> this. When you combine the "default install" with the "minimum deps
> for everything" list you end up with an ssh you can't get rid of
> without the package.provided hack (which really should be used for
> stuff that is, well, provided).

FWIW, instead of using package.provided, you can negate "*virtual/ssh"
like this:

mkdir -p /etc/portage/profile
echo "-*virtual/ssh" >> /etc/portage/profile/packages

--
Thanks,
Zac
 
Old 08-16-2012, 07:51 PM
"Gregory M. Turner"
 
Default Questions about SystemD and OpenRC

On 8/16/2012 4:59 AM, Rich Freeman wrote:

On Wed, Aug 15, 2012 at 3:18 PM, Michael Mol <mikemol@gmail.com> wrote:

It also sounds like something like that could be a benefit to shrinking @system.



I think the solution to the circular dependency issue isn't to make
Portage able to completely bootstrap the whole system, but rather just
to make it capable of coping with the issues and knowing when to raise
an alarm.

Gentoo will always involve extracting a tarball/etc for the initial
installation since you always need SOMETHING to start with. You can't
even chroot into your install directory without a shell being there,
and typing "emerge" won't go so well if portage isn't actually
installed.

So, continue to build stages like we do right now - no doubt with
hard-coding and such to get around the dependencies.

As far as objections to listing gcc and such in every ebuild go, why
not? We list all kinds of routine stuff in hundreds of ebuilds so
that we can install systems without them. Why not just have a
toolchain virtual or something?

And since ssh was brought up - this is what happens with hacks like
this. When you combine the "default install" with the "minimum deps
for everything" list you end up with an ssh you can't get rid of
without the package.provided hack (which really should be used for
stuff that is, well, provided).

It would be nice if people who want to build a server with Gentoo but
then reduce it to only true RDEPENDS could do so. Obviously they'd
have to use binary packages to continue to maintain it (and even then
they'd need to keep portage on it), or they'd have to build another
one. Actually, the trend in general is towards disposable servers
anyway so generating an entire new server every time one thing changes
is probably a desirable thing, since you probably want to be able to
do it every time you add a server anyway.


tldr: I like, approve and otherwise +1 the idea of somehow paring down
or eliminating @system but I think it's going to be fairly challenging,
so more discussion on this topic is warranted in my humble non-developer
opinion


--

I really like everything you have to say here. Unfortunately,
assumptions of toolchain availability have gotten into the DNA of Gentoo
in ways that make it nontrivial -- although probably not rocket science,
either -- to implement these ideas.


I'd say it's the kind of thing where somebody needs to do the work. I
think there is demand for this, but when it comes down to brass tacks,
people who really need features like this can just write a script to
push some tarballs or files around in a way that's "good enough" for
their purposes. What is the cost/bene for a single sys-admin to do all
the work and politics of making this change?


However, staying with the cost/bene theme, we have here a kind of
externality, as they say in economics, (which is a fancy way, I guess,
of saying a bad decision or a raw deal), because, in the aggregate, I
think it's pretty clear that the cost/bene favors doing that work.


To be clear, I don't have religion about getting rid of @system, per-se,
but I do have religion about the stuff Larry the Cow told me when I
first visited the Gentoo homepage in 2001, or whenever, which was,
basically, that the software I was using had a bunch of frobs that I
couldn't touch, because I was running an rpm- or .deb-based system, and
that Gentoo was going to let me frob them.


It's not a total disaster, even now -- a determined sysadmin can
absolutely do this right now with features like prefix, ROOT, binpkg and
so forth.... but /really/ fixing this, so that non-standard/minimal
setups "just work", would allow Gentoo to effectively address a whole
bunch of really practical, real-world use-cases -- use-cases Gentoo is
in many aspects uniquely suited to address, due to Larry the Cow's
brilliant insights -- yet, by-and-large, due to precisely this @system
thing and the package-management decisions that have stemmed from it,
for which Gentoo has become unsuitable or impractical.


Specifically, I'm talking, here, about managed LAMP servers, big-data
clusters, and embedded.


I suppose I'm not doing much to fix it by ranting and raving like this
however. So see first paragraph


-gmt
 
Old 08-16-2012, 08:05 PM
Michael Mol
 
Default Questions about SystemD and OpenRC

On Thu, Aug 16, 2012 at 3:51 PM, Gregory M. Turner <gmt@malth.us> wrote:
> On 8/16/2012 4:59 AM, Rich Freeman wrote:
>>
[snip]

>
>
> tldr: I like, approve and otherwise +1 the idea of somehow paring down or
> eliminating @system but I think it's going to be fairly challenging, so more
> discussion on this topic is warranted in my humble non-developer opinion
>
> --
>
> I really like everything you have to say here. Unfortunately, assumptions
> of toolchain availability have gotten into the DNA of Gentoo in ways that
> make it nontrivial -- although probably not rocket science, either -- to
> implement these ideas.

The limited-visibility build feature discussed a week or so ago would
go a long way in detecting unexpressed build dependencies.

--
:wq
 
Old 08-17-2012, 01:26 AM
Rich Freeman
 
Default Questions about SystemD and OpenRC

On Thu, Aug 16, 2012 at 4:05 PM, Michael Mol <mikemol@gmail.com> wrote:
> The limited-visibility build feature discussed a week or so ago would
> go a long way in detecting unexpressed build dependencies.

I can't say that is a coincidence, but my intent would be to include
@system as implicit dependencies, at least until we change that policy
(though the morbidly curious could use that as a test in a tinderbox
to find packages in @system that are good candidates for removal).

I haven't gotten to test it, but after studying sandbox it shouldn't
be hard to just hack together a manual test by removing read access to
root from the config files and adding in a bazillion files. That
should at least let me profile performance/etc. I'm not convinced
that there isn't room for improvement, but if it works well as-is then
automating this shouldn't be hard at all. If portage has the
dependency tree in RAM then you just need to dump all the edb listings
for those packages plus @system and feed those into sandbox. That
just requires reading a bunch of text files and no searching, so it
should be pretty quick. As far as I can tell the relevant calls to
check for read access are already being made in sandbox already, and
obviously they aren't taking forever. We just have to see if the
search gets slow if the access list has tens of thousands of entries
(if it does, that is just a simple matter of optimization, but being
in-RAM I can't see how tens of thousands of entries is going to slow
down a modern CPU even if it is just an unsorted list).

Rich
 
Old 08-17-2012, 02:02 AM
Michael Mol
 
Default Questions about SystemD and OpenRC

On Thu, Aug 16, 2012 at 9:26 PM, Rich Freeman <rich0@gentoo.org> wrote:
> On Thu, Aug 16, 2012 at 4:05 PM, Michael Mol <mikemol@gmail.com> wrote:
>> The limited-visibility build feature discussed a week or so ago would
>> go a long way in detecting unexpressed build dependencies.
>
> I can't say that is a coincidence, but my intent would be to include
> @system as implicit dependencies, at least until we change that policy
> (though the morbidly curious could use that as a test in a tinderbox
> to find packages in @system that are good candidates for removal).
>
> I haven't gotten to test it, but after studying sandbox it shouldn't
> be hard to just hack together a manual test by removing read access to
> root from the config files and adding in a bazillion files. That
> should at least let me profile performance/etc. I'm not convinced
> that there isn't room for improvement, but if it works well as-is then
> automating this shouldn't be hard at all. If portage has the
> dependency tree in RAM then you just need to dump all the edb listings
> for those packages plus @system and feed those into sandbox. That
> just requires reading a bunch of text files and no searching, so it
> should be pretty quick. As far as I can tell the relevant calls to
> check for read access are already being made in sandbox already, and
> obviously they aren't taking forever. We just have to see if the
> search gets slow if the access list has tens of thousands of entries
> (if it does, that is just a simple matter of optimization, but being
> in-RAM I can't see how tens of thousands of entries is going to slow
> down a modern CPU even if it is just an unsorted list).

Yeah, I presumed you'd have @system as a set of implicit dependencies.
The obvious approaches would be to either temporarily remove a package
from @system, tell the portage to ignore a package while doing limited
visibility, or copy @system to a different, temporary set and remove
things piecemeal from there.

That last might make the most sense. "--implicit-dependencies ---
defaults to @system. Additional instances append to the set of
implicit dependencies. Use, e.g. -${ATOM} or -@system to override
default include."

--
:wq
 
Old 08-18-2012, 08:50 PM
"Gregory M. Turner"
 
Default Questions about SystemD and OpenRC

On 8/16/2012 6:26 PM, Rich Freeman wrote:

On Thu, Aug 16, 2012 at 4:05 PM, Michael Mol <mikemol@gmail.com> wrote:

The limited-visibility build feature discussed a week or so ago would
go a long way in detecting unexpressed build dependencies.


[snip]


If portage has the
dependency tree in RAM then you just need to dump all the edb listings
for those packages plus @system and feed those into sandbox.



That just requires reading a bunch of text files and no searching, so it
should be pretty quick.


Portage could hypothetically compile such a list while it crawls the
package dependency tree, but I suspect the cost will not be small as you
predict.



As far as I can tell the relevant calls to
check for read access are already being made in sandbox already, and
obviously they aren't taking forever. We just have to see if the
search gets slow if the access list has tens of thousands of entries
(if it does, that is just a simple matter of optimization, but being
in-RAM I can't see how tens of thousands of entries is going to slow
down a modern CPU even if it is just an unsorted list).


I appreciate your optimism but I think you're underestimating the cost.
Can't speak for others, but my portage db's churn too much for comfort
as is. Once we start multiplying per-package-dependency iteration by
the files-per-package iteration, that's going to be O(a-shit-load).


Of course, where there's a will there's a way. I'd be surprised if some
kind of delayed-evaluation + caching scheme wouldn't suffice, or,
barring that, perhaps it's time to create an indexed-database-based
drop-in replacement for the current portage db code.


I've enclosed some scripts you may find helpful in looking at the
numbers. They are kind-of kludgey (originally intended for
in-house-only use and modified for present purposes) but may help shed
some light, if they aren't too buggy, that is...


"dumpworld" slices and dices "emerge -ep" output to provide a list of
atoms in the complete dependency tree of a given list of atoms (add
'@system' to get the complete tree, dumpworld won't do so).


"dumpfiles" operates only on packages installed in the local system
(non-installed atoms are silently dropped), and requires/assumes that
'emerge -ep world' would not change anything if it is to give accurate
information. It takes a list of atoms, transforms them into the
complete lists of atoms in their dependency tree via dumpworld, merges
the lists together, and finds the number of files associated with each
atom in portage. Any collisions will be counted twice, since it doesn't
keep track. It also doesn't add '@system' unless you do. By default it
emits:


o A list of package atoms and the files owned by each atom (stderr)
o total atoms and files
o average filename length

What is, perhaps, more discouraging than the numbers it reports is how
long it takes to run (note: although I suspect an optimized python
implementation could be made to do this faster by a moderate constant
factor, I'm not sure if the big-oh performance characteristics can be
significantly improved without database structure changes like the ones
mentioned above).


My disturbingly bloated and slow workstation gives these answers (note:
here it's even slower because it's running in an emulator):


greg@fedora64vmw ~ $ time bash -c 'dumpfiles @system 2>/dev/null'
TOTAL: 402967 files (in 816 ebuilds, average path length: 66)


real 15m33.719s
user 13m18.909s
sys 2m8.436s
greg@fedora64vmw ~ $ time bash -c 'dumpfiles chromium 2>/dev/null'
TOTAL: 401300 files (in 807 ebuilds, average path length: 66)


real 15m28.900s
user 13m15.126s
sys 2m8.088s

My workstation is surely an "outlier" as I have a lot of dependencies
and files due to multilib, split-debug, and USE+=$( a lot ). It's also
got slow hardware Raid6 and the emulator only gives it 2G of ram to work
with. But I'm a real portage user; I'm sure there's other ones out
there, if not many, with similar constraints.


-gmt
#!/bin/bash

if [[ x$(qlist -IC app-portage/portage-utils)x == xx ||
x$(qlist -IC app-portage/gentoolkit)x == xx ]] ; then
echo "This utility requires both app-portage/portage-utils" >&2
echo "and app-portage/gentoolkit. Emerge them both and try again." >&2
exit 1
fi

declare -a arguments atoms

arguments=( )
atoms=( )

verbose=yes
redic=no

for arg in "$@" ; do
case $arg in
-q|--quiet) verbose=no ;;
-r|--redic) redic=yes ;;
*) arguments=( "${arguments[@]}" "$arg" ) ;;
esac
done

[[ ${#arguments[*]} == 0 ]] && arguments=( '@world' )

for arg in "${arguments[@]}" ; do
if [[ ${arg} == @* ]] ; then
newatoms=( "${arg}" )
else
newatoms=( "$( qlist -eICv "${arg}" | sed 's/^/=/' )" )
fi
newatoms=( $( dumpworld "${newatoms[@]}" ) )
result=$?
[[ ${result} != 0 ]] && { echo "dumpworld failed, giving up." >&2 ; exit ${result} ; }
atoms=( "${atoms[@]}" "${newatoms[@]}" )
done

# OK, we have all the packages -- remove dups, there could be a bunch.
atoms=( $( for atom in "${atoms[@]}" ; do echo "${atom}" ; done | sort -u ) )

[[ ${verbose} == yes ]] &&
echo "Checking for files depended upon by the specified atom(s):" >&2 &&
echo >&2

total=0
totalfilechars=0
for atom in "${atoms[@]}" ; do
# turns out equery filse includes certain files (/usr/lib/debug... but why?)
# that qlist excludes so ... we'd might as well get all the bad news possible
files=$( equery -Cq files "${atom}" )
result=$?
[[ $result == 0 ]] || { echo "equery -Cq files ${atom} failed." >&2 ; exit $result ; }
count=$( echo "${files}" | wc -l )
(( total += count ))
while read filename ; do
(( totalfilechars += ${#filename} ))
done < <( echo "${files}" )
if [[ ${verbose} == yes ]] ; then
if [[ ${redic} == yes ]] ; then
echo "${files}"
else
echo "${atom}: ${count}" >&2
fi
fi
done
[[ ${verbose} == yes ]] && echo >&2 && echo >&2

[[ ${verbose} == yes ]] && echo -n "TOTAL: "
echo -n "${total}"

averagepathlen=$(( totalfilechars / ${total} ))
[[ ${verbose} == yes ]] && echo -n " files (in ${#atoms[*]} ebuilds, average path length: ${averagepathlen})"
echo
echo

exit 0
#!/bin/bash

declare -a atoms
if [[ x$1 == x ]] ; then
atoms=( '@world' )
else
atoms=( "$@" )
fi

emerge_result=$( emerge --ignore-default-opts -epqD --backtrack=999 --with-bdeps=y "${atoms[@]}" 2>/dev/null )

trouble=$?

echo "${emerge_result}" |grep -v ^.uninstall | grep -v ^.blocks | sed 's/^.[^]]*] /=/;s/ [[^]]*].*$//'

if [[ ${trouble} != 0 ]] ; then
echo "WARNING: results not reliable due to portage failure." >&2
echo "since portage stderr is ignored by this script, this" >&2
echo "could mean anything, perhaps depsolving trouble?" >&2
exit ${trouble}
fi

exit 0
 

Thread Tools




All times are GMT. The time now is 07:33 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org