FAQ Search Today's Posts Mark Forums Read

» Linux Archive
Home
New Posts
Search
FAQ


Go Back   Linux Archive > Redhat > Fedora Build System

 
 
LinkBack Thread Tools
 
Old 07-10-2008, 04:02 PM
Mike Bonnet
 
Default Supporting EPEL Builds in Koji

Hi. I've written up a proposal for a way to support EPEL builds in
Koji. It's not the only way we could do this, but I think it's doable
with a reasonable amount of effort, and has the side-effect of greatly
simplifying the Koji setup process for a lot of people (by removing the
need to bootstrap/import an entire distro of packages into your private
Koji instance). You can view the proposal here:

http://fedoraproject.org/wiki/Koji/EPELSupport

It's fairly detailed regarding the data model changes necessary, so if
you're not familiar with the Koji codebase you can skip those parts.
Questions and comments welcome.

Thanks,
Mike


--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 07-10-2008, 05:12 PM
Jeroen van Meeuwen
 
Default Supporting EPEL Builds in Koji

Mike Bonnet wrote:

Hi. I've written up a proposal for a way to support EPEL builds in
Koji. It's not the only way we could do this, but I think it's doable
with a reasonable amount of effort, and has the side-effect of greatly
simplifying the Koji setup process for a lot of people (by removing the
need to bootstrap/import an entire distro of packages into your private
Koji instance). You can view the proposal here:

http://fedoraproject.org/wiki/Koji/EPELSupport

It's fairly detailed regarding the data model changes necessary, so if
you're not familiar with the Koji codebase you can skip those parts.
Questions and comments welcome.



Hi Mike,

good to see you've spend some time on this whereas I have been lazy in
Littleton (holiday).


I'd like to share a few thoughts on the Wiki page -which is a great start;

From the Wiki page: "There is a strong feeling that if a package exists
in the Koji-managed local repo (whose contents the Koji admin has full
control over) it should always be preferred over the external repo
(whose contents the Koji admin may have little or no control over)."


The preference koji will have (in using which package in the buildroot),
might introduce the problem where customly built package foo-1.0 is used
in the buildroot, and upstream updates to foo-1.1 - the running nodes
would update to foo-1.1 whereas the buildroot still uses the custom
foo-1.0...


The point being, that these updates have to managed as they are
released. The updates need to managed on the side where said packages
are being mashed into a repository (infra side) or applied (client side).


You can see the duplicate effort when the updates are managed on either
side (infra or client), _and_ in koji, separately.


I would like to suggest the koji development team makes the priority
setting koji is going to use a configurable item -which in compared to
the bigger picture isn't all that much a priority, just something to
think about.


Additionally, I'd like to comment on / ask about the proposed database
changes for the tag_config table; In an attempt to show you what I was
thinking, here's a number of questions;


From the Wiki page: "At repo creation time, the repodata will be
retrieved from the processed url and merged with the local repodata as
described above. This single repo will then be used for subsequent
builds against the tag"


Do I understand correctly one can only give one single repository URL to
a certain tag? Does this mean that a tag is created for (example)
"dist-el5" with a remote repository URL, and then "dist-el5-updates"
with another remote repository URL? This means for the build target used
to have dist-el5-updates inherit dist-el5, right? Which then implies
either metadata needs to be imported for dist-el5-updates or inheritance
can only be applied during build-time... right?


The question I guess is basically; how does koji handle tags with a
combination of remote urls & inheritance?


From the Wiki page: "Right now that (rpminfo) table enforces uniqueness
of (name, version, release, arch)."


I see that koji does not store complete package nevra which may become a
problem in case duplicate nvra occur (which is very much likely the case
where rebuilding packages with the release number bumped might collide
with upstream doing a release bump -which is where the epoch is often
used as upstream has clear guidelines for epoch bumps which -hopefully-
make them occur in special circumstances only and thus very much reduces
the chance of a colliding nevra). I like the proposed uniqueness of
NVRA-namespaces as well, don't get me wrong ;-)


The other thing (and probably the last thing for now) I'd like to share
is that, for reproducibility purposes, how viable would it be to have
koji automatically import the remote RPM (the file and all the data) as
it is used from the remote repository? This may or may not be a
configurable option, saves work for admins compared to the situation
now, and preserves reproducibility under all circumstances, adding the
automatically imported RPM to the appropriate tags, storing them for
reproducibility whereas upstream only keeps two versions in the
repository... Though I understand it 1) consumes space and 2) isn't
helpful for the EPEL case, I think this is particularly useful for
long-term supported appliance software. Just wondering here ;-)


Let me know what you think,

Kind regards,

Jeroen van Meeuwen
-kanarip

--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 07-10-2008, 05:18 PM
Jeroen van Meeuwen
 
Default Supporting EPEL Builds in Koji

Jeroen van Meeuwen wrote:

I'd like to share a few thoughts on the Wiki page -which is a great start;



(...)

Did I mention my primary concern with aforementioned questions are more
related to "make-your-own" private koji instances rather then the one
that is going to build EPEL?


Sorry for any confusion.

Kind regards,

Jeroen van Meeuwen
-kanarip

--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 07-10-2008, 07:49 PM
Mike Bonnet
 
Default Supporting EPEL Builds in Koji

On Thu, 2008-07-10 at 19:12 +0200, Jeroen van Meeuwen wrote:
> Mike Bonnet wrote:
> > Hi. I've written up a proposal for a way to support EPEL builds in
> > Koji. It's not the only way we could do this, but I think it's doable
> > with a reasonable amount of effort, and has the side-effect of greatly
> > simplifying the Koji setup process for a lot of people (by removing the
> > need to bootstrap/import an entire distro of packages into your private
> > Koji instance). You can view the proposal here:
> >
> > http://fedoraproject.org/wiki/Koji/EPELSupport
> >
> > It's fairly detailed regarding the data model changes necessary, so if
> > you're not familiar with the Koji codebase you can skip those parts.
> > Questions and comments welcome.
> >
>
> Hi Mike,
>
> good to see you've spend some time on this whereas I have been lazy in
> Littleton (holiday).
>
> I'd like to share a few thoughts on the Wiki page -which is a great start;
>
> From the Wiki page: "There is a strong feeling that if a package exists
> in the Koji-managed local repo (whose contents the Koji admin has full
> control over) it should always be preferred over the external repo
> (whose contents the Koji admin may have little or no control over)."
>
> The preference koji will have (in using which package in the buildroot),
> might introduce the problem where customly built package foo-1.0 is used
> in the buildroot, and upstream updates to foo-1.1 - the running nodes
> would update to foo-1.1 whereas the buildroot still uses the custom
> foo-1.0...

Yes, it's up to the Koji admin to monitor the remote repo, and take
appropriate action when their custom local packages are superseded by
packages in the remote repo. That may be untagging or blocking the
package locally so the newer version can be pulled down from the remote
repo. Or it may be rebuilding the custom package based on the updated
sources. The point is that the build environment doesn't change unless
the Koji admin takes some action to change it.

> The point being, that these updates have to managed as they are
> released. The updates need to managed on the side where said packages
> are being mashed into a repository (infra side) or applied (client side).
>
> You can see the duplicate effort when the updates are managed on either
> side (infra or client), _and_ in koji, separately.

There is duplicate effort either way. The difference is that, if
highest-nvr-wins is used, and a remote repo updates to a later version
of a package that you have a custom build of, there is *no way* for you
to revert your build environment to that lower-nvr version without
bumping your version higher than their version (without actually
changing the source at all) and rebuilding. It encourages this Cold War
arms-race of version numbers between your custom packages and the remote
repo's packages, and results in the admin having to fake higher version
numbers and rebuild constantly *without any source changes* just to keep
their custom packages in their build environment.

Alternately, if first-match-wins is used (where the first repo is the
locally-managed Koji repo), and a remote repo updates to a later version
of a package you have a custom version of, nothing happens to your build
environment. If you decide you want the newer version from the remote
repo, you untag your local package and let it get pulled in from the
remote repo. If that newer version has problems, retag your custom
version and it will then be available in the build environment again.
There is no unnecessary building of packages, no faking version numbers,
and no unexpected changes to your build environment. It's the
"principle of least surprise", which is why I think it's the right
policy to use in a managed build environment like Koji.

> I would like to suggest the koji development team makes the priority
> setting koji is going to use a configurable item -which in compared to
> the bigger picture isn't all that much a priority, just something to
> think about.

I strongly feel that this isn't something that needs to be configurable,
and that first-match-wins is the correct behavior. But if other people
agree that there is a valid use-case for making it configurable, and
Seth and/or James can make the logic in repomerge configurable, then we
can add switch for it to Koji.

> Additionally, I'd like to comment on / ask about the proposed database
> changes for the tag_config table; In an attempt to show you what I was
> thinking, here's a number of questions;
>
> From the Wiki page: "At repo creation time, the repodata will be
> retrieved from the processed url and merged with the local repodata as
> described above. This single repo will then be used for subsequent
> builds against the tag"
>
> Do I understand correctly one can only give one single repository URL to
> a certain tag? Does this mean that a tag is created for (example)
> "dist-el5" with a remote repository URL, and then "dist-el5-updates"
> with another remote repository URL? This means for the build target used
> to have dist-el5-updates inherit dist-el5, right? Which then implies
> either metadata needs to be imported for dist-el5-updates or inheritance
> can only be applied during build-time... right?
>
> The question I guess is basically; how does koji handle tags with a
> combination of remote urls & inheritance?

Originally you were correct, the proposal only allowed for a single
remote repo to be configured. This was mandated by the desire to track
packages back to their repository of origin, and the lack of repository
data in the rpmdb. jkeating convinced me that this wasn't a very useful
implementation, and suggested that we could get information about the
origin of a given rpm from the baseurl in the repodata.

I've updated the wiki page with a new implementation proposal that will
allow for multiple remote repos while still tracking package origin, and
specifies how remote repos will interact with the tag inheritance tree.
Please take a look and let me know what you think.

> From the Wiki page: "Right now that (rpminfo) table enforces uniqueness
> of (name, version, release, arch)."
>
> I see that koji does not store complete package nevra which may become a
> problem in case duplicate nvra occur (which is very much likely the case
> where rebuilding packages with the release number bumped might collide
> with upstream doing a release bump -which is where the epoch is often
> used as upstream has clear guidelines for epoch bumps which -hopefully-
> make them occur in special circumstances only and thus very much reduces
> the chance of a colliding nevra). I like the proposed uniqueness of
> NVRA-namespaces as well, don't get me wrong ;-)

Koji intentionally ignores epoch when enforcing uniqueness. For better
or worse, the epoch is mostly hidden from users, and does not show up in
the filename. Having packages with the same NVRA but different epochs
was considered harmful when Koji was being designed, and it will prevent
this from happening. Note that Koji does *store* the epoch, it just
doesn't use it when enforcing uniqueness.

In the proposal, local packages exist in one NVRA namespace, and each
remote repo (differentiated by URL) exists in a different NVRA
namespace. So NVRA much be unique within each repo (local or remote)
but not across repos. So NVRA collisions between your local Koji
instance and a remote repo will not cause problems at the data model
level. Which package gets selected and made available in the buildroots
will be handled by the (possibly configurable) package selection policy
of createrepo/mergerepo.

> The other thing (and probably the last thing for now) I'd like to share
> is that, for reproducibility purposes, how viable would it be to have
> koji automatically import the remote RPM (the file and all the data) as
> it is used from the remote repository? This may or may not be a
> configurable option, saves work for admins compared to the situation
> now, and preserves reproducibility under all circumstances, adding the
> automatically imported RPM to the appropriate tags, storing them for
> reproducibility whereas upstream only keeps two versions in the
> repository... Though I understand it 1) consumes space and 2) isn't
> helpful for the EPEL case, I think this is particularly useful for
> long-term supported appliance software. Just wondering here ;-)

This sounds much more like the secondary-arch approach, and is separate
from what we're trying to accomplish here. I had requested that the
secondary-arch daemon support a "same-arch-downstream" mode where it
would download and import (rather than rebuild) builds from an upstream
Koji as they were completed. However, this is a lot more complicated
and requires more detailed policy. If this is a requirement for you, I
suggest you take a look at the secondary-arch work.


--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 07-17-2008, 05:54 PM
Mike McLean
 
Default Supporting EPEL Builds in Koji

Mike Bonnet wrote:

http://fedoraproject.org/wiki/Koji/EPELSupport


This is mostly in line with what I've been thinking. I do have a few
comments/concerns thought...

If the remote_repo_url data is going to be inherited (and I tend to
think it should be), then I think it should be in a separate table. I'd
like to reserve tag_config for data that is local to individual tags.
This will also make it easier to represent multiple remote repos.

I'm a little concerned about using the rpminfo table. Yes, I know it
seems wasteful to introduce another table to track very similar data,
but these remote rpms really are differently tracked and handled than
the local ones.


Also, I'm not sure how I feel about having rpminfo entries will null
build_id. Sure, technically the field lacks the 'not null' constraint,
but that is more of an oversight.


Note, I'm not outright rejecting the idea of using rpminfo this way, but
I am concerned.



As for the origin field. I think we should track where these external
rpms come from, but I'm not sure about including in the uniqueness
constraint. I'm not sure that the value of that field is sufficiently
well defined (or canonicalizable) for such use. I'd rather see the
sigmd5 value (or some abstracting sighash field) used as a unique index.



Following are additional ideas relating to this feature. They are
perhaps a bit ambitious for the short term, but I'd at least like to
keep them in mind with the initial design so we don't paint ourselves
into a corner.


First, I'd like to be able to support external koji servers (or rather a
target or tag from an external koji server) in addition to external
repos. Some of the ideas are the same, however an external koji server
provides more information and more structure.


Second, I'm fond of having a tag /represent/ some external repo/whatever
and having the normal inheritance mechanism take care of priority. The
trick here is that Koji tag content is by build, but it will be tricky
to correctly determine build structure for external rpms -- indeed,
external repos might include subpackages from different versions of the
same build (the an external koji server would not, at least for its
local content). So this will probably be difficult, but if we could
manage something like this, I'd feel a lot better about using the
rpminfo table.


Doing something like this would most likely require Koji to comprehend
the external repos instead of just passing them off to a repomerge tool.


Third, we may not want to use a repomerge tool. The yum-priorities
plugin might serve just as well, and allow us to specify some different
yum repo options per external repo. This may conflict with idea#2 though.


--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 07-17-2008, 09:08 PM
Dennis Gilmore
 
Default Supporting EPEL Builds in Koji

On Thursday 17 July 2008, Mike McLean wrote:
> Mike Bonnet wrote:
> > http://fedoraproject.org/wiki/Koji/EPELSupport
>
> This is mostly in line with what I've been thinking. I do have a few
> comments/concerns thought...
>
> If the remote_repo_url data is going to be inherited (and I tend to
> think it should be), then I think it should be in a separate table. I'd
> like to reserve tag_config for data that is local to individual tags.
> This will also make it easier to represent multiple remote repos.
>
> I'm a little concerned about using the rpminfo table. Yes, I know it
> seems wasteful to introduce another table to track very similar data,
> but these remote rpms really are differently tracked and handled than
> the local ones.
>
> Also, I'm not sure how I feel about having rpminfo entries will null
> build_id. Sure, technically the field lacks the 'not null' constraint,
> but that is more of an oversight.
>
> Note, I'm not outright rejecting the idea of using rpminfo this way, but
> I am concerned.
>
>
> As for the origin field. I think we should track where these external
> rpms come from, but I'm not sure about including in the uniqueness
> constraint. I'm not sure that the value of that field is sufficiently
> well defined (or canonicalizable) for such use. I'd rather see the
> sigmd5 value (or some abstracting sighash field) used as a unique index.
>
>
> Following are additional ideas relating to this feature. They are
> perhaps a bit ambitious for the short term, but I'd at least like to
> keep them in mind with the initial design so we don't paint ourselves
> into a corner.
>
> First, I'd like to be able to support external koji servers (or rather a
> target or tag from an external koji server) in addition to external
> repos. Some of the ideas are the same, however an external koji server
> provides more information and more structure.
In addition to external koji servers, id like to support spacewalk servers.
and have the ability to push builds back into channels on spacewalk servers.
ideally the spacewalk server knows how to pull from koji server rather than
duplicating data by importing directly. this way an organisation could build
upon fedora/RHEL/CentOS for their own needs. but can also have an easier time
doing rel-eng on them.

> Second, I'm fond of having a tag /represent/ some external repo/whatever
> and having the normal inheritance mechanism take care of priority. The
> trick here is that Koji tag content is by build, but it will be tricky
> to correctly determine build structure for external rpms -- indeed,
> external repos might include subpackages from different versions of the
> same build (the an external koji server would not, at least for its
> local content). So this will probably be difficult, but if we could
> manage something like this, I'd feel a lot better about using the
> rpminfo table.
i would think there should be a 1-1 mapping of tag external repo using normal
inheritence.
> Doing something like this would most likely require Koji to comprehend
> the external repos instead of just passing them off to a repomerge tool.
>
> Third, we may not want to use a repomerge tool. The yum-priorities
> plugin might serve just as well, and allow us to specify some different
> yum repo options per external repo. This may conflict with idea#2 though.
I can see a case where this wont work. i have a local tag built on top of F-8
i want it lower than the remote F-9 because some of what i need is now in
fedora, but i need other bits from my tag to be inherited so that i can boot
strap things to the F-9 level. maybe we would produce 2 local repos and use
yum priorites. to fit them together. maybe this case is rare enough not to
bother with. but it could be an idea to keep in mind.

--
Dennis Gilmore

--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 07-17-2008, 10:48 PM
Mike Bonnet
 
Default Supporting EPEL Builds in Koji

On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
> Mike Bonnet wrote:
> > http://fedoraproject.org/wiki/Koji/EPELSupport
>
> This is mostly in line with what I've been thinking. I do have a few
> comments/concerns thought...
>
> If the remote_repo_url data is going to be inherited (and I tend to
> think it should be), then I think it should be in a separate table. I'd
> like to reserve tag_config for data that is local to individual tags.
> This will also make it easier to represent multiple remote repos.

I don't have any problem with this, though it does mean we'll need to
duplicate quite a bit of the inheritance-walking code, or make it
configurable as to which inheritance it's walking. This new table would
also have to be versioned, the same way the tag_config table is.

> I'm a little concerned about using the rpminfo table. Yes, I know it
> seems wasteful to introduce another table to track very similar data,
> but these remote rpms really are differently tracked and handled than
> the local ones.

The big win here is that the methods and tools that query rpminfo for
information about what was present in the buildroot at build time
wouldn't have to change, or only change slightly. With minor
modification the web UI can continue to show a list of all packages in a
buildroot, along with a flag indicating if they were local or remote.
The buildroot_listing table would not have to change at all. The
majority of XML-RPC calls that interact with the rpminfo or
buildroot_listing tables would only need minor modifications. Adding a
new table to track remote rpms metadata and which remote rpms end up in
a buildroot would add significant effort to this proposal. Also, I
think it's more semantically correct to have a single place where we
track rpm metadata and buildroot contents, regardless of where they came
from.

> Also, I'm not sure how I feel about having rpminfo entries will null
> build_id. Sure, technically the field lacks the 'not null' constraint,
> but that is more of an oversight.

Yes, I realize that the "not null" constraint should exist now, and in
fact all rpms in the Fedora database do reference builds. However, I
think logically having a remote rpm not reference a local build makes
sense. The alternative is to create the build object from the srpm info
in the repodata (along with some namespacing similar to rpminfo).
However, this would significantly clutter the build table with
information that is pretty non-essential.

> Note, I'm not outright rejecting the idea of using rpminfo this way, but
> I am concerned.
>
>
> As for the origin field. I think we should track where these external
> rpms come from, but I'm not sure about including in the uniqueness
> constraint. I'm not sure that the value of that field is sufficiently
> well defined (or canonicalizable) for such use. I'd rather see the
> sigmd5 value (or some abstracting sighash field) used as a unique index.

I'm open to suggestions on how to modify the uniqueness constraint to
handle this case. We care about ensuring that a locally-built rpm
doesn't have the same n-v-r as another locally-built rpm. I don't think
we care at all about n-v-r uniqueness amongst remote rpms. However, we
probably want to avoid creating 2 rpminfo entries when the same remote
rpm is used in 2 different buildroots. Using the sigmd5 is a good way
to avoid that. However, what happens if a remote rpm with the same
n-v-r and sigmd5 gets pulled in from 2 different remote repos? Perhaps
the "origin" field should be pushed down to the buildroot_listing table,
so the buildroots can reference the same rpminfo object, but indicate
that it came from a different repo in each buildroot?

Also, what happens when we find 2 remote rpms with the same n-v-r but
different sigmd5s? Should that be an error?

> Following are additional ideas relating to this feature. They are
> perhaps a bit ambitious for the short term, but I'd at least like to
> keep them in mind with the initial design so we don't paint ourselves
> into a corner.
>
> First, I'd like to be able to support external koji servers (or rather a
> target or tag from an external koji server) in addition to external
> repos. Some of the ideas are the same, however an external koji server
> provides more information and more structure.

I agree that this is a desirable goal. I believe this is more the
domain of the Koji secondary-arch daemon. It would be talking directly
to an "upstream" Koji server, analyzing what it's doing, and applying
some logic to decide what builds to import or replicate, and where/how
to do it. This proposal has the much more modest goal of simply
consuming static external repos, and is more appropriate for the EPEL
and private-standalone-Koji case.

> Second, I'm fond of having a tag /represent/ some external repo/whatever
> and having the normal inheritance mechanism take care of priority. The
> trick here is that Koji tag content is by build, but it will be tricky
> to correctly determine build structure for external rpms -- indeed,
> external repos might include subpackages from different versions of the
> same build (the an external koji server would not, at least for its
> local content). So this will probably be difficult, but if we could
> manage something like this, I'd feel a lot better about using the
> rpminfo table.
>
> Doing something like this would most likely require Koji to comprehend
> the external repos instead of just passing them off to a repomerge tool.

The tag content may be managed by build, but when it's time for it to
actually get used (in the form of a yum repo) it gets unfolded into a
big list of rpms. And what gets associated with a buildroot is simply a
big list of rpms. Conceptually I don't really have a problem with the
idea of a tag as a big list of rpms, that we happen to group by srpm
within Koji because it's more convenient for us. So adding the external
repo information to tag_config is just an extension of the big list of
rpms model.

However, we will already be parsing the remote repodata, which contains
information like the srpm name for each rpm, so we could do something
more sophisticated here.

> Third, we may not want to use a repomerge tool. The yum-priorities
> plugin might serve just as well, and allow us to specify some different
> yum repo options per external repo. This may conflict with idea#2 though.

This was my first thought as well. However, after discussions with
Jesse, Seth, and James I was convinced otherwise. The yum-priorities
plugin seems very unpopular with yum developers (not quite sure why). I
don't think yum-priorities would give us any way to completely block a
package from local and remote repos, and configuring multiple repos in
the mock config would require Koji to retrieve and parse each remote
repodata to determine the origin of a given remote rpm.

The repomerge tool seems like it solves the problem better, and would be
more useful in general.


--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 07-18-2008, 02:29 AM
Jesse Keating
 
Default Supporting EPEL Builds in Koji

On Thu, 2008-07-17 at 18:48 -0400, Mike Bonnet wrote:
> This was my first thought as well. However, after discussions with
> Jesse, Seth, and James I was convinced otherwise. The yum-priorities
> plugin seems very unpopular with yum developers (not quite sure why). I
> don't think yum-priorities would give us any way to completely block a
> package from local and remote repos, and configuring multiple repos in
> the mock config would require Koji to retrieve and parse each remote
> repodata to determine the origin of a given remote rpm.

Also you wouldn't be able to prioritize at the srpm level which is what
we want (no unwanted subpackages sneaking in).

--
Jesse Keating
Fedora -- Freedom² is a feature!
--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 07-18-2008, 03:38 PM
Mike McLean
 
Default Supporting EPEL Builds in Koji

Mike Bonnet wrote:

On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:

If the remote_repo_url data is going to be inherited (and I tend to
think it should be), then I think it should be in a separate table. I'd
like to reserve tag_config for data that is local to individual tags.
This will also make it easier to represent multiple remote repos.


I don't have any problem with this, though it does mean we'll need to
duplicate quite a bit of the inheritance-walking code, or make it
configurable as to which inheritance it's walking. This new table would
also have to be versioned, the same way the tag_config table is.


Walking inheritance is just a matter of determining the inheritance
order and scanning data on the parent tags in sequence. Currently,
nothing scans tag_config in this way because no data in tag_config is
inherited. (Well, in a sense tag_changed_since_event() does walk
tag_config, but that's a little different.)


We need to figure out how we'll deal with multiplicity for the external
repos. If tag A uses repo X and inherits from tag B which uses repo Y,
then does tag A use both X and Y, or does the X entry override it?

A (+repo X)
+- B (+repo Y)

My inclination is that it should override, because I think we'll want
some way to do override that that mechanism seems easiest.


Also, I think we'll probably want to allow multiple external repos per
tag, something which will be much easier to represent in an external
table. We can include an explicit priority field to make a sane
uniqueness condition (and to provide a clear ordering for the repo merge).



The big win here is that the methods and tools that query rpminfo for
information about what was present in the buildroot at build time

-snip-

I see all that, and I'm almost convinced. The flipside is that by
default all the code will treat these external rpms the same as the
local ones, which will not be correct for a number of cases. Obviously,
part of this will involve changing code to behave differently for the
external ones, I'm just worried about how much we might have to change,
or what we might miss.



Yes, I realize that the "not null" constraint should exist now, and in
fact all rpms in the Fedora database do reference builds. However, I
think logically having a remote rpm not reference a local build makes
sense. The alternative is to create the build object from the srpm info
in the repodata (along with some namespacing similar to rpminfo).
However, this would significantly clutter the build table with
information that is pretty non-essential.


The idea of grouping them into builds appeals to me, but I don't think
it's possible in general (though maybe we could fake it well enough
somehow). The only data we're (mostly) guaranteed to have to work with
is the sourcerpm header field. The catch is that in case of an
nvr-collision we can't determine which build it belongs to (or indeed if
we should create a new build of same nvr).



I'm open to suggestions on how to modify the uniqueness constraint to
handle this case. We care about ensuring that a locally-built rpm
doesn't have the same n-v-r as another locally-built rpm. I don't think
we care at all about n-v-r uniqueness amongst remote rpms. However, we
probably want to avoid creating 2 rpminfo entries when the same remote
rpm is used in 2 different buildroots. Using the sigmd5 is a good way
to avoid that.


Agreed. same sigmd5 ==> same rpm.


However, what happens if a remote rpm with the same
n-v-r and sigmd5 gets pulled in from 2 different remote repos?


This gets into part of what bugs me about this and why I'm somewhat
inclined to keep the ext repo data a step removed. It's so potentially
dirty. Koji has all these consistency constraints that an external repo
(much less many of them in aggregate) lacks.


It's quite possible that an external repo might respin a package keeping
the same nvr, so we don't even need 2 external repos to hit this
possibility.



Perhaps
the "origin" field should be pushed down to the buildroot_listing table,
so the buildroots can reference the same rpminfo object, but indicate
that it came from a different repo in each buildroot?


Interesting. Yeah, I think that is is probably the right answer.

Also, I'm thinking we need to have some sort of rpm_origin table so that
all these references can be managed cleanly.



Also, what happens when we find 2 remote rpms with the same n-v-r but
different sigmd5s? Should that be an error?


Certainly we have to allow the possibility that two origins might have
overlapping nvras. Within a single origin, I'm not so sure. I suppose we
can get away with some small consistency demands. As long as we're only
enforcing unique nvra for local builds and indexing by sigmd5/similar, I
don't think we /have/ to make this an error condition.


In the same vein, what happens when an external repo has an nvra+sigmd5
matching a /local/ rpm? Maybe it doesn't matter, though I guess
technically we want to record the origin properly when it gets into a
buildroot via external repo vs internal tag.


First, I'd like to be able to support external koji servers (or rather a

...

I agree that this is a desirable goal. I believe this is more the
domain of the Koji secondary-arch daemon. It would be talking directly


Well, it has some similarities to 2nd arch, but still quite different.

The more I think about it, the more I think that supporting an external
koji server will probably be much different from from the ext repo
business. Most of the issues with rpminfo will carry over, but with a
koji server we will be able to determine build data and can probably
actually pull off something like "inherit from tag X on koji server Y."



The tag content may be managed by build, but when it's time for it to
actually get used (in the form of a yum repo) it gets unfolded into a
big list of rpms. And what gets associated with a buildroot is simply a
big list of rpms. Conceptually I don't really have a problem with the
idea of a tag as a big list of rpms, that we happen to group by srpm
within Koji because it's more convenient for us. So adding the external
repo information to tag_config is just an extension of the big list of
rpms model.


Yeah, I almost wish I hadn't made the build structure quite the way I did.


However, we will already be parsing the remote repodata, which contains
information like the srpm name for each rpm, so we could do something
more sophisticated here.

-snipsnip-
...

The repomerge tool seems like it solves the problem better, and would be
more useful in general.


If we're going to have our fingers in the repodata, we'll probably want
to have them in the merge too. Perhaps we can get createrepo and/or this
repomerge tool usefully libified?


--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 
Old 08-13-2008, 09:35 PM
Mike Bonnet
 
Default Supporting EPEL Builds in Koji

On Fri, 2008-07-18 at 11:38 -0400, Mike McLean wrote:
> Mike Bonnet wrote:
> > On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
> >> If the remote_repo_url data is going to be inherited (and I tend to
> >> think it should be), then I think it should be in a separate table. I'd
> >> like to reserve tag_config for data that is local to individual tags.
> >> This will also make it easier to represent multiple remote repos.
> >
> > I don't have any problem with this, though it does mean we'll need to
> > duplicate quite a bit of the inheritance-walking code, or make it
> > configurable as to which inheritance it's walking. This new table would
> > also have to be versioned, the same way the tag_config table is.
>
> Walking inheritance is just a matter of determining the inheritance
> order and scanning data on the parent tags in sequence. Currently,
> nothing scans tag_config in this way because no data in tag_config is
> inherited. (Well, in a sense tag_changed_since_event() does walk
> tag_config, but that's a little different.)

Sorry, I was referring to walking tag_inheritance. I'd rather have one
place that walks the inheritance hierarchy and aggregates data from it,
than two places that are doing almost the same thing.

Each tag has a set of builds associated with it. We walk the
inheritance hierarchy, aggregating the builds from each tag in the
hierarchy into a flat list, and then pass that list to createrepo. We
would do essentially the same thing for external repos. When walking
the hierarchy, if a tag has an external repo associated with it, we
would append that repo url to a flat list, and pass that list to
mergerepo. In both cases we're working with collections of packages
that are associated with a tag, just in different formats.

> We need to figure out how we'll deal with multiplicity for the external
> repos. If tag A uses repo X and inherits from tag B which uses repo Y,
> then does tag A use both X and Y, or does the X entry override it?
> A (+repo X)
> +- B (+repo Y)
>
> My inclination is that it should override, because I think we'll want
> some way to do override that that mechanism seems easiest.

In discussing this with Jesse, I think we want external repos to be
inherited. This is probably the easiest way to deal with having
multiple external repos getting pulled in to a single buildroot, which
is essential for Fedora (think F9 GA and F9 Updates).

The idea was that, by convention, we would have external-repo-only tags,
with only a single external repo associated with it and no
packages/builds associated. These external-repo-only tags could then be
inserted into the build hierarchy where appropriate. An ordered list of
external repos could then be constructed by performing the current
depth-first search of the inheritance hierarchy. The ordered list would
then be passed to mergerepo, which would ensure that packages in repos
earlier in the list supersede packages (by srpm name) in repos later in
the list. This would preserve the "first-match-wins" inheritance policy
that Koji currently implements, and that admins expect. For example:

dist-custom-build
├─dist-custom
└─dist-f9-updates-external
└─dist-f9-ga-external

would result mergerepo creating a single repo that would only contain
packages from dist-f9-ga-external if they did not exist in the
Koji-generated repo (dist-custom-build + dist-custom),
dist-f9-updates-external, or the blacklist of blocked packages. This is
consistent with how Koji package inheritance currently works, and I
think is the most intuitive approach.

> Also, I think we'll probably want to allow multiple external repos per
> tag, something which will be much easier to represent in an external
> table. We can include an explicit priority field to make a sane
> uniqueness condition (and to provide a clear ordering for the repo merge).

As outlined above, I'd prefer to keep it to one external repo per tag,
along with repo inheritance. I think this is easier from a management
perspective, and more consistent with the way Koji currently works.
Ordering for mergerepo will be represented by the location of the tag in
the inheritance hierarchy. With a 1-to-1 tag->external repo mapping, it
then makes sense to store the external repo url in the tag_config table.

> > The big win here is that the methods and tools that query rpminfo for
> > information about what was present in the buildroot at build time
> -snip-
>
> I see all that, and I'm almost convinced. The flipside is that by
> default all the code will treat these external rpms the same as the
> local ones, which will not be correct for a number of cases. Obviously,
> part of this will involve changing code to behave differently for the
> external ones, I'm just worried about how much we might have to change,
> or what we might miss.

Personally I'd prefer adding a few special cases to the existing code,
rather than maintain a whole heap of almost-but-not-quite-the-same code
to manage external rpms. I think that conceptually they're alike enough
that the number of special cases will be minimal.

> > Yes, I realize that the "not null" constraint should exist now, and in
> > fact all rpms in the Fedora database do reference builds. However, I
> > think logically having a remote rpm not reference a local build makes
> > sense. The alternative is to create the build object from the srpm info
> > in the repodata (along with some namespacing similar to rpminfo).
> > However, this would significantly clutter the build table with
> > information that is pretty non-essential.
>
> The idea of grouping them into builds appeals to me, but I don't think
> it's possible in general (though maybe we could fake it well enough
> somehow). The only data we're (mostly) guaranteed to have to work with
> is the sourcerpm header field. The catch is that in case of an
> nvr-collision we can't determine which build it belongs to (or indeed if
> we should create a new build of same nvr).

I think that synthesizing builds for that sake of maintaining the
not-null constraint is more pain than it's worth, and would make
enforcing our nvr-uniqueness constraints (which we definitely want to do
for local builds) more difficult. Having locally-built rpms always
associated with a build, and external rpms not, makes sense to me.

> > I'm open to suggestions on how to modify the uniqueness constraint to
> > handle this case. We care about ensuring that a locally-built rpm
> > doesn't have the same n-v-r as another locally-built rpm. I don't think
> > we care at all about n-v-r uniqueness amongst remote rpms. However, we
> > probably want to avoid creating 2 rpminfo entries when the same remote
> > rpm is used in 2 different buildroots. Using the sigmd5 is a good way
> > to avoid that.
>
> Agreed. same sigmd5 ==> same rpm.
>
> > However, what happens if a remote rpm with the same
> > n-v-r and sigmd5 gets pulled in from 2 different remote repos?
>
> This gets into part of what bugs me about this and why I'm somewhat
> inclined to keep the ext repo data a step removed. It's so potentially
> dirty. Koji has all these consistency constraints that an external repo
> (much less many of them in aggregate) lacks.
>
> It's quite possible that an external repo might respin a package keeping
> the same nvr, so we don't even need 2 external repos to hit this
> possibility.
>
> > Perhaps
> > the "origin" field should be pushed down to the buildroot_listing table,
> > so the buildroots can reference the same rpminfo object, but indicate
> > that it came from a different repo in each buildroot?
>
> Interesting. Yeah, I think that is is probably the right answer.
>
> Also, I'm thinking we need to have some sort of rpm_origin table so that
> all these references can be managed cleanly.

That sounds reasonable to me. Note that we may end up with a lot of
rows in this table, since we're allowing variable substitution in the
external_repo_url (tag name and arch). But I don't see that as a
problem.

> > Also, what happens when we find 2 remote rpms with the same n-v-r but
> > different sigmd5s? Should that be an error?
>
> Certainly we have to allow the possibility that two origins might have
> overlapping nvras. Within a single origin, I'm not so sure. I suppose we
> can get away with some small consistency demands. As long as we're only
> enforcing unique nvra for local builds and indexing by sigmd5/similar, I
> don't think we /have/ to make this an error condition.

Yeah, it's probably safest to not make this an error condition, since we
have very little control over the remote repos.

> In the same vein, what happens when an external repo has an nvra+sigmd5
> matching a /local/ rpm? Maybe it doesn't matter, though I guess
> technically we want to record the origin properly when it gets into a
> buildroot via external repo vs internal tag.

Right, we would record the origin as the remote repo it came from (by
parsing the merged repodata and looking at the baseurl).

> >> First, I'd like to be able to support external koji servers (or rather a
> ...
> > I agree that this is a desirable goal. I believe this is more the
> > domain of the Koji secondary-arch daemon. It would be talking directly
>
> Well, it has some similarities to 2nd arch, but still quite different.
>
> The more I think about it, the more I think that supporting an external
> koji server will probably be much different from from the ext repo
> business. Most of the issues with rpminfo will carry over, but with a
> koji server we will be able to determine build data and can probably
> actually pull off something like "inherit from tag X on koji server Y."

And in the external Koji server case, it might actually make sense to
create build objects for the external rpms, since we'll be able to query
the external Koji about which build an rpm came from.

> > The tag content may be managed by build, but when it's time for it to
> > actually get used (in the form of a yum repo) it gets unfolded into a
> > big list of rpms. And what gets associated with a buildroot is simply a
> > big list of rpms. Conceptually I don't really have a problem with the
> > idea of a tag as a big list of rpms, that we happen to group by srpm
> > within Koji because it's more convenient for us. So adding the external
> > repo information to tag_config is just an extension of the big list of
> > rpms model.
>
> Yeah, I almost wish I hadn't made the build structure quite the way I did.
>
> > However, we will already be parsing the remote repodata, which contains
> > information like the srpm name for each rpm, so we could do something
> > more sophisticated here.
> -snipsnip-
> ...
> > The repomerge tool seems like it solves the problem better, and would be
> > more useful in general.
>
> If we're going to have our fingers in the repodata, we'll probably want
> to have them in the merge too. Perhaps we can get createrepo and/or this
> repomerge tool usefully libified?

I was thinking we would probably just call out to the tool the way we do
for createrepo, but I'm certainly not against using an API. I'm a
little concerned about memory usage when doing the create/mergerepo
in-process, since we know python and mod_python have garbage-collection
issues, but that may be a "cross the bridge when we come to it" problem.
Seth, is it feasible to provide an API to mergerepo that we could use
directly?


--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
 

Thread Tools




All times are GMT. The time now is 08:19 PM.

VBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org