Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Gentoo Development (http://www.linux-archive.org/gentoo-development/)
-   -   adding a modification timestamp to the installed pkgs database (vdb) (http://www.linux-archive.org/gentoo-development/307854-adding-modification-timestamp-installed-pkgs-database-vdb.html)

Denis Dupeyron 01-11-2010 09:35 PM

adding a modification timestamp to the installed pkgs database (vdb)
 
Brian,

On Sun, Oct 25, 2009 at 6:50 PM, Brian Harring <ferringb@gmail.com> wrote:
> The proposal is pretty simple; if code modifies the vdb in any
> fashion, it needs to update the mtime on a file named
> '.modification_time' in the root of the vdb.
>
> For example-
>
> 1) ${PACKAGE_MANAGER} fires ups, builds a pkg. *it's now ready to
> install it.
> 2) this step isn't strictly required, but is a zero cost safety
> measure- prior to modifying the vdb, it updates the timestamp. *The
> reason for doing this is to protect against the manager blowing up in
> some fashion and now updating the timestamp- there still is a window
> if the manager breaks down during merging but it's far reduced.
> 3) manager does it's thing to the livefs, and to the vdb.
> 4) once finished, again, updates the timestamp.
>
> This isn't an incredibly complex change. *What it enables however is
> package managers to get serious about optimizing access to the vdb.
> For example for the 3 managers:
>
> paludis:
> *installed-cache currently needs to be manually ran by the user;
> specifically, the user is responsible for regenerating this cache if
> they use a non paludis manager to modify the VDB. *This can be
> automated via checking the vdb timestamp against a stored copy of the
> the vdb timestamp at the time of the cache generation.
>
> portage:
> *portage maintains a set of denormalized caches of the vdb- it however
> has to do validation of those caches on each access, meaning quite a
> few stats. *Same thing, can compare timestamp from current vdb to when
> it was generated to identify if it is no longer authorative.
>
> pkgcore:
> *pkgcore maintains a denormalized old style virtuals cache- same thing
> w/ portage, it has to do validation (stat'ing) whenever it uses that
> cache to ensure the data is accurate. *Same thing, can compare
> timestamp from current vdb to whenit was generated to identify if it
> is no longer authorative.
>
> The existing vdb caching could all be modified to use this timestamp.
> One stat in the best (common) case, instead of having to either scan
> the whole vdb each time or doing a subset of stats.
>
> This change enables further caching/denormalization of the vdb data
> while maintaining the old format- basically, it allows the manager to
> build out a helluva lot faster access to the vdb while keeping on
> disk compatibility in /var/db/pkg.
>
>
> Now unfortunately since the vdb is not format versioned in any
> fashion, to get this timestamp we have to do the following-
>
> 1) nudge everyone who has code poking into the vdb to update their
> code to update the timestamp
> 2) sit on our hands for N months until such time we've deemed
> "everyone we care about has upgraded"
> 3) push out a new release, and start pushing out versions of the
> managers/vdb consumers that use this timestamp instead of just
> updating it.
>
> For anyone who has been around gentoo for a couple of years, this is a
> pretty familiar pattern- eapi, profile changes, etc, all go through
> this unfortunately.
>
>
> That's the core of the proposal; there is a ticket open
> ( http://bugs.gentoo.org/290428 ) regarding this although there is
> some debate from ciaran which I'll try to now summarize, along w/ the
> counterarguments.
>
> 1) do a new vdb.
> Counter: this mechanism provides a way to synchronize the new vdb
> while maintaining the old during it's transition period, so this is
> needed anyways. *Further, pinning all of our optimization hopes on a
> new vdb is daft- it's been discussed for 5+ years now and still
> hasn't materialized (pkgcore has been able to have a new vdb for
> several years, but without a synchronization mechanism it would
> require locking users into the new format and locking out old
> consumers of the vdb- an unfriendly choice to push on users, hence
> never being implemented).
>
> 2) code that hasn't been updated to adjust the timestamp, but is still
> in use after the transition period will break things.
> *Counter: nature of any modification of this sort, frankly the gains
> outweight the costs of users being rediculously out of date. *Not
> saying it's perfect, but until someone comes up with a proposal that
> versions every PMS component (meaning PMS has to start documenting
> the VDB), it's what we have if we wish to move forward in
> refactoring.
>
> 3) the correct approach is to require users to tell each manager that
> changes have occured outside it's purview (run paludis
> --regenerate-installed-cache after every time you invoke pmerge or
> emerge).
> *Counter: that's rather unfriendly to users, and isn't what
> pkgcore/portage do. *Further, it's historically the opposite of the
> norm- consider the ebuild cache (we do validation as we go there,
> instead of expecting users to do a emerge --regen everytime they
> modify an ebuild).
>
>
> That's roughly the three points raised; there is some minor quibbling
> that mtime cannot be trusted, but that's mostly a variation of #2.

This looks to me like a good idea. I see some of it at least has been
implemented in portage and I would suspect in pkgcore too. However
it's not obvious to me that all the code is ready, and I don't see any
real specs, docs, etc... You're a seasoned slacker^Wdeveloper so you
know the drill. I will add this as a topic for the open floor
discussion for january but don't expect us to vote on it before we
have all of the above. Now, it might be that this whole thing is held
back by a more philosophical question in which case feel free to
propose it for addition to the (preferably february) agenda.

I'm a bit surprised by the low amount of discussions this topic has
generated. I know there is a bug about this and that there was some
action there, but still. I think that getting the above material ready
(specs, doc, PMS?, whatever) has a good chance of triggering
additional discussions.

Feel free to contact me in case you need help.
Denis.

Ciaran McCreesh 01-17-2010 08:46 AM

adding a modification timestamp to the installed pkgs database (vdb)
 
2010/1/17 Tobias Klausmann <klausman@gentoo.org>:
>> No, we'd not do it that way. If we're ditching VDB, the only sane way
>> to do it is to ditch it with an rm -fr when creating the new layout.
>> Keeping two sets of data around is going to lead to breakage no matter
>> how well we do things.
>
> Please also provide a downgrade path, i.e. a way to go back from
> the new DB version to the current one should it be necessary (if
> there is no such path, Murphy will see to it that the new format
> breaks in interesting[0] ways).

That probably wouldn't be possible. One of the reasons we want to
ditch VDB is to allow multiple slots of the same cat/pkg-ver to be
installed in parallel (which is in turn necessary to allow some of the
more hideous dynamic slot abuses that people are after). VDB doesn't
support that, so you probably won't be able to go back once you've
started using new features.

*shrug* all of this is years off, anyway. It's at least EAPI 5
territory. We can work all this out later if EAPI 4 ever happens.

--
Ciaran McCreesh

Ciaran McCreesh 01-17-2010 10:09 AM

adding a modification timestamp to the installed pkgs database (vdb)
 
2010/1/17 Christian Faulhammer <fauli@gentoo.org>:
> Ciaran McCreesh <ciaran.mccreesh@googlemail.com>:
> *As much as you love to have the new and shiny VDB2, it is far off.
> Prototyping and drafting implementations would be great to have some
> base where we can discuss on (in a civil manner). *So having this
> timestamp would be a good way to prepare a sane migration path.

No, it wouldn't. Brian's proposal a) would be of no use whatsoever for
VDB2 migration, and b) would not be used by VDB2. Having a *decent*
cache validation mechanism is a good idea; having a half-baked one is
a waste of time.

--
Ciaran McCreesh

Ciaran McCreesh 01-18-2010 03:37 PM

adding a modification timestamp to the installed pkgs database (vdb)
 
2010/1/18 Brian Harring <ferringb@gmail.com>:
> Propose something, or shut up frankly.

I propose we don't do anything until someone comes up with a decent
cache proposal.

> If all you're going to contribute is "it's half baked" claims, you're
> wasting folks time. *You've had a couple of months of time to
> counterpropose something- back it up with a proposal or be silent
> please.

Doing nothing is better than doing something useless.

> As is, quite a few folk see how experimental vdb2/vdb1 synchronization
> can be done w/ this timestamp- your claims thus far that it won't work
> seem to boil down to "but not everyone will update the timestamp".

Er, no. It comes down to VDB2 implementing things that VDB1 doesn't
support, such as having multiple installed slots of the same
cat/pkg-ver, thus making it impossible to have both VDB1 and VDB2 at
the same time.

I have never argued against this proposal because "not everyone will
update the timestamp". That's an argument you've made up and
attributed to me.

> Which gets right back to why I'm elevating this to the council to
> *force* PMS to include this, thus force the holdout (paludis) to
> update the timestamp thus invalidating your cyclical claim.

PMS doesn't mention VDB at all. You're barking up the wrong tree. If
you want me to include it in Paludis, all you have to do is come up
with a proposal that does everything we need, rather than a proposal
that can't legally be used for anything at all.

> What I won't do is sit around and listen to you whinge about the sky
> falling or that I/others are being idiots via not going
> the route *you* want and standardizing caches across all the managers-
> as I said, you want that functionality *you* propose it.

I propose that rather than implementing a half-baked cache that isn't
usable for anything, we do nothing until someone does come up with a
full, unified cache proposal, where the validity of caches after
operations is well defined.

> It's not how things should be done, but it's about the only way to get
> something done when you dig in and go cyclical.

Cyclical on what? Explain where there is a cycle anywhere. You keep
claiming I "go cyclical", but never point out any actual cycles. It's
what you fall back on when you don't have an argument.

>*Wish it weren't that way, but I've more interest in progress then playing games w/
> folk looking to be poisonous.

And again, the whole "poisonous" thing. It's the last resort of those
who are themselves the poison. How is wanting to do nothing until
something can be done properly, rather than doing something that
doesn't solve anything, poisonous?

> Seriously, if you can't even be bothered to spell out your claims in
> full or layout a counter proposal, instead spending your time
> screaming "nyah nyah it won't work!" as you did for prefix, I'm not
> having it.

Uh, I already did, several times, and you ignored me, snipped them out
and said I was "going cyclical".

I'll also point out that I raised a long list of things that were
wrong with Prefix way back when it all started, and over the past few
months everyone has finally realised that that list was full of
legitimate concerns that are just now being addressed. Is it going to
take you five years to see how I'm right here too? And how much more
damage are you going to do to Gentoo before you admit that, as with
Prefix, I've thought this through properly and you're just rushing
along with the first thing that popped into your head?

So, for you to ignore yet again:

* The proposal does not define exactly what the validity of a cache
is. You are sort of implicitly assuming that the validity of a cache
is a function exclusively of "the VDB not being modified", for some
undefined value of "not being modified", but nowhere are you stating
concretely what the rules are.

* You are addressing *only* VDB validity, rather than doing validity
of all repositories at the same time.

* There is no granularity to the proposal. There is simply an
ill-defined "modified" rule, with no way for a package manager to know
what was modified or by whom.

* You aren't doing anything to fix the zillions of different caches
that package managers have to use.

> There are better uses of folks time frankly, and users deserve
> functionality over daft pissing matches.

Then give them a functional, shared cache, not a cache that can't
legally be used for anything.

--
Ciaran McCreesh


All times are GMT. The time now is 09:24 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.