|
|

08-13-2008, 10:46 PM
|
|
|
Supporting EPEL Builds in Koji
On Wed, 2008-08-13 at 17:35 -0400, Mike Bonnet wrote:
> I was thinking we would probably just call out to the tool the way we do
> for createrepo, but I'm certainly not against using an API. I'm a
> little concerned about memory usage when doing the create/mergerepo
> in-process, since we know python and mod_python have garbage-collection
> issues, but that may be a "cross the bridge when we come to it" problem.
> Seth, is it feasible to provide an API to mergerepo that we could use
> directly?
>
createrepo has an api. repomerge should be relatively easy to use the
same way since repomerge is really just a combination script using
createrepo and yum's interfaces.
when I have the script cleaned up more I'll make sure you can import it
usefully.
-sv
--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
|
|

10-06-2008, 08:14 PM
|
|
|
Supporting EPEL Builds in Koji
Mike Bonnet wrote:
On Fri, 2008-07-18 at 11:38 -0400, Mike McLean wrote:
Mike Bonnet wrote:
On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
If the remote_repo_url data is going to be inherited (and I tend to
think it should be), then I think it should be in a separate table.
...
I don't have any problem with this, though it does mean we'll need to
duplicate quite a bit of the inheritance-walking code,
...
Walking inheritance is just a matter of determining the inheritance
order and scanning data on the parent tags in sequence.
...
Sorry, I was referring to walking tag_inheritance. I'd rather have one
place that walks the inheritance hierarchy and aggregates data from it,
than two places that are doing almost the same thing.
We're talking about inherently different data. External repos to be
merged in are quite different from builds in the system.
Each tag has a set of builds associated with it. We walk the
inheritance hierarchy, aggregating the builds from each tag in the
hierarchy into a flat list, and then pass that list to createrepo. We
would do essentially the same thing for external repos. When walking
the hierarchy, if a tag has an external repo associated with it, we
would append that repo url to a flat list, and pass that list to
mergerepo. In both cases we're working with collections of packages
that are associated with a tag, just in different formats.
Sure, we can do this with one call to readFullInheritance, and traverse
both the build table and external repo table from the given order.
In discussing this with Jesse, I think we want external repos to be
inherited. This is probably the easiest way to deal with having
multiple external repos getting pulled in to a single buildroot, which
is essential for Fedora (think F9 GA and F9 Updates).
The idea was that, by convention, we would have external-repo-only tags,
with only a single external repo associated with it and no
packages/builds associated. These external-repo-only tags could then be
inserted into the build hierarchy where appropriate. An ordered list of
external repos could then be constructed by performing the current
depth-first search of the inheritance hierarchy. The ordered list would
then be passed to mergerepo, which would ensure that packages in repos
earlier in the list supersede packages (by srpm name) in repos later in
the list. This would preserve the "first-match-wins" inheritance policy
that Koji currently implements, and that admins expect. For example:
dist-custom-build
├─dist-custom
└─dist-f9-updates-external
└─dist-f9-ga-external
would result mergerepo creating a single repo that would only contain
packages from dist-f9-ga-external if they did not exist in the
Koji-generated repo (dist-custom-build + dist-custom),
dist-f9-updates-external, or the blacklist of blocked packages. This is
consistent with how Koji package inheritance currently works, and I
think is the most intuitive approach.
It is similar, but different in potentially confusing ways. External
repos do not have build structure, so we can't really have the same sort
of inheritance behavior with a combination of external repo tags and
normal tags.
We order the external repos in inheritance order, but ultimately those
repos are merged with the internal one in a way that does not honor
inheritance in the way that the admin might expect.
Using tags to represent external repos fails intuition because external
repos are very much not like tags. When we get to supporting external
koji systems, we can do something like this, but for external repos the
"bolted-on" nature needs to be clear. This is why I'd prefer to have the
data a little more removed.
I see all that, and I'm almost convinced. The flipside is that by
default all the code will treat these external rpms the same as the
local ones, which will not be correct for a number of cases.
Personally I'd prefer adding a few special cases to the existing code,
rather than maintain a whole heap of almost-but-not-quite-the-same code
to manage external rpms. I think that conceptually they're alike enough
that the number of special cases will be minimal.
I think I'm ok with using the rpminfo table.
I think that synthesizing builds for that sake of maintaining the
not-null constraint is more pain than it's worth, and would make
enforcing our nvr-uniqueness constraints (which we definitely want to do
for local builds) more difficult. Having locally-built rpms always
associated with a build, and external rpms not, makes sense to me.
Ok, agreed.
Also, I'm thinking we need to have some sort of rpm_origin table so that
all these references can be managed cleanly.
That sounds reasonable to me. Note that we may end up with a lot of
rows in this table, since we're allowing variable substitution in the
external_repo_url (tag name and arch). But I don't see that as a
problem.
I'm thinking the only substitution we should support is arch. Anything
else sort of constitutes a different repo.
If we use an origin table like this we can abstract out the arch.
Something like:
create table external_repo (
id SERIAL PRIMARY KEY,
name TEXT );
create table external_repo_config (
external_repo_id INTEGER NOT NULL REFERENCES external_repo (id),
url TEXT NOT NULL,
-- plus versioning fields
-- ... );
This way if upstream repo changes url scheme or moves to a different
host, you can keep some notion of connectedness. External rpms would
simply reference external_repo_id.
In the same vein, what happens when an external repo has an nvra+sigmd5
matching a /local/ rpm? Maybe it doesn't matter, though I guess
technically we want to record the origin properly when it gets into a
buildroot via external repo vs internal tag.
Right, we would record the origin as the remote repo it came from (by
parsing the merged repodata and looking at the baseurl).
So where do we draw the line between code that we add to koji and code
that we add to createrepo (or some external merge-repo tool)?
However, we will already be parsing the remote repodata, which contains
information like the srpm name for each rpm, so we could do something
more sophisticated here.
-snipsnip-
...
The repomerge tool seems like it solves the problem better, and would be
more useful in general.
If we're going to have our fingers in the repodata, we'll probably want
to have them in the merge too. Perhaps we can get createrepo and/or this
repomerge tool usefully libified?
I was thinking we would probably just call out to the tool the way we do
for createrepo, but I'm certainly not against using an API. I'm a
little concerned about memory usage when doing the create/mergerepo
in-process, since we know python and mod_python have garbage-collection
issues, but that may be a "cross the bridge when we come to it" problem.
Seth, is it feasible to provide an API to mergerepo that we could use
directly?
I don't think I even saw a reply from Seth on this. Where does the
mergerepo code stand now?
--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
|
|

10-17-2008, 09:20 PM
|
|
|
Supporting EPEL Builds in Koji
On Mon, 2008-10-06 at 15:14 -0400, Mike McLean wrote:
> > would result mergerepo creating a single repo that would only contain
> > packages from dist-f9-ga-external if they did not exist in the
> > Koji-generated repo (dist-custom-build + dist-custom),
> > dist-f9-updates-external, or the blacklist of blocked packages. This is
> > consistent with how Koji package inheritance currently works, and I
> > think is the most intuitive approach.
>
> It is similar, but different in potentially confusing ways. External
> repos do not have build structure, so we can't really have the same sort
> of inheritance behavior with a combination of external repo tags and
> normal tags.
> I don't think I even saw a reply from Seth on this. Where does the
> mergerepo code stand now?
mergerepo has been checked into createrepo and should do what you want,
now.
it requires HEAD of createrepo and as soon as I make a new release yum
3.2.19-6 or 3.2.20 of yum.
-sv
--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
|
|

01-05-2009, 06:20 PM
|
|
|
Supporting EPEL Builds in Koji
Picking up this thread again, sorry about the long delay. I'd like to
come to consensus on the approach here, hammer out any remaining details
at FUDCon this weekend, and hopefully get this implemented by the end of
January. Time to really get rid of plague!
On Mon, 2008-10-06 at 15:14 -0400, Mike McLean wrote:
> Mike Bonnet wrote:
> > On Fri, 2008-07-18 at 11:38 -0400, Mike McLean wrote:
> >> Mike Bonnet wrote:
> >>> On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
> >>>> If the remote_repo_url data is going to be inherited (and I tend to
> >>>> think it should be), then I think it should be in a separate table.
> ...
> >>> I don't have any problem with this, though it does mean we'll need to
> >>> duplicate quite a bit of the inheritance-walking code,
> ...
> >> Walking inheritance is just a matter of determining the inheritance
> >> order and scanning data on the parent tags in sequence.
> ...
> > Sorry, I was referring to walking tag_inheritance. I'd rather have one
> > place that walks the inheritance hierarchy and aggregates data from it,
> > than two places that are doing almost the same thing.
>
> We're talking about inherently different data. External repos to be
> merged in are quite different from builds in the system.
Yes, I see the issue here. Since remote repos won't have their packages
filtered out (by mergerepo) until after all packages in the local
inheritance hierarchy are placed in the repo, they don't really follow
the existing inheritance rules.
Ok, you've convinced me. A separate table that stores a
priority-ordered list of remote repos associated with each tag will
probably be easier to manage. The lists will be aggregated when walking
the tag hierarchy and passed to mergerepo in (priority, inheritance)
order for proper filtering (based on srpm name, first match wins).
> > Each tag has a set of builds associated with it. We walk the
> > inheritance hierarchy, aggregating the builds from each tag in the
> > hierarchy into a flat list, and then pass that list to createrepo. We
> > would do essentially the same thing for external repos. When walking
> > the hierarchy, if a tag has an external repo associated with it, we
> > would append that repo url to a flat list, and pass that list to
> > mergerepo. In both cases we're working with collections of packages
> > that are associated with a tag, just in different formats.
>
> Sure, we can do this with one call to readFullInheritance, and traverse
> both the build table and external repo table from the given order.
Yes, that makes sense.
> > In discussing this with Jesse, I think we want external repos to be
> > inherited. This is probably the easiest way to deal with having
> > multiple external repos getting pulled in to a single buildroot, which
> > is essential for Fedora (think F9 GA and F9 Updates).
> >
> > The idea was that, by convention, we would have external-repo-only tags,
> > with only a single external repo associated with it and no
> > packages/builds associated. These external-repo-only tags could then be
> > inserted into the build hierarchy where appropriate. An ordered list of
> > external repos could then be constructed by performing the current
> > depth-first search of the inheritance hierarchy. The ordered list would
> > then be passed to mergerepo, which would ensure that packages in repos
> > earlier in the list supersede packages (by srpm name) in repos later in
> > the list. This would preserve the "first-match-wins" inheritance policy
> > that Koji currently implements, and that admins expect. For example:
> >
> > dist-custom-build
> > ├─dist-custom
> > └─dist-f9-updates-external
> > └─dist-f9-ga-external
> >
> > would result mergerepo creating a single repo that would only contain
> > packages from dist-f9-ga-external if they did not exist in the
> > Koji-generated repo (dist-custom-build + dist-custom),
> > dist-f9-updates-external, or the blacklist of blocked packages. This is
> > consistent with how Koji package inheritance currently works, and I
> > think is the most intuitive approach.
>
> It is similar, but different in potentially confusing ways. External
> repos do not have build structure, so we can't really have the same sort
> of inheritance behavior with a combination of external repo tags and
> normal tags.
>
> We order the external repos in inheritance order, but ultimately those
> repos are merged with the internal one in a way that does not honor
> inheritance in the way that the admin might expect.
>
> Using tags to represent external repos fails intuition because external
> repos are very much not like tags. When we get to supporting external
> koji systems, we can do something like this, but for external repos the
> "bolted-on" nature needs to be clear. This is why I'd prefer to have the
> data a little more removed.
Ok, we're agreed on this.
> >> I see all that, and I'm almost convinced. The flipside is that by
> >> default all the code will treat these external rpms the same as the
> >> local ones, which will not be correct for a number of cases.
> >
> > Personally I'd prefer adding a few special cases to the existing code,
> > rather than maintain a whole heap of almost-but-not-quite-the-same code
> > to manage external rpms. I think that conceptually they're alike enough
> > that the number of special cases will be minimal.
>
> I think I'm ok with using the rpminfo table.
>
> > I think that synthesizing builds for that sake of maintaining the
> > not-null constraint is more pain than it's worth, and would make
> > enforcing our nvr-uniqueness constraints (which we definitely want to do
> > for local builds) more difficult. Having locally-built rpms always
> > associated with a build, and external rpms not, makes sense to me.
>
> Ok, agreed.
>
> >> Also, I'm thinking we need to have some sort of rpm_origin table so that
> >> all these references can be managed cleanly.
> >
> > That sounds reasonable to me. Note that we may end up with a lot of
> > rows in this table, since we're allowing variable substitution in the
> > external_repo_url (tag name and arch). But I don't see that as a
> > problem.
>
> I'm thinking the only substitution we should support is arch. Anything
> else sort of constitutes a different repo.
>
> If we use an origin table like this we can abstract out the arch.
> Something like:
>
> create table external_repo (
> id SERIAL PRIMARY KEY,
> name TEXT );
> create table external_repo_config (
> external_repo_id INTEGER NOT NULL REFERENCES external_repo (id),
> url TEXT NOT NULL,
> -- plus versioning fields
> -- ... );
>
> This way if upstream repo changes url scheme or moves to a different
> host, you can keep some notion of connectedness. External rpms would
> simply reference external_repo_id.
Makes sense. So a tag would simply reference the external_repo_id as
well, and the repo url would be set elsewhere (globally). The table
storing the external repo info for tags would look like:
create table tag_external_repos (
tag_id INTEGER NOT NULL REFERENCES tag(id),
external_repo_id INTEGER NOT NULL REFERENCES external_repo(id),
priority INTEGER NOT NULL,
-- plus versioning fields
UNIQUE (tag_id,priority,active)
);
I like this, it keeps everything much more normalized.
> >> In the same vein, what happens when an external repo has an nvra+sigmd5
> >> matching a /local/ rpm? Maybe it doesn't matter, though I guess
> >> technically we want to record the origin properly when it gets into a
> >> buildroot via external repo vs internal tag.
> >
> > Right, we would record the origin as the remote repo it came from (by
> > parsing the merged repodata and looking at the baseurl).
Right, and the origin can just be stored as a reference to the
external_repo(id).
> So where do we draw the line between code that we add to koji and code
> that we add to createrepo (or some external merge-repo tool)?
Koji would only be responsible for parsing the repodata and populating
the database with the correct origin for any given rpm. mergerepo would
be responsible for creating the repo and enforcing the filtering rules.
> >>> However, we will already be parsing the remote repodata, which contains
> >>> information like the srpm name for each rpm, so we could do something
> >>> more sophisticated here.
> >> -snipsnip-
> >> ...
> >>> The repomerge tool seems like it solves the problem better, and would be
> >>> more useful in general.
> >> If we're going to have our fingers in the repodata, we'll probably want
> >> to have them in the merge too. Perhaps we can get createrepo and/or this
> >> repomerge tool usefully libified?
> >
> > I was thinking we would probably just call out to the tool the way we do
> > for createrepo, but I'm certainly not against using an API. I'm a
> > little concerned about memory usage when doing the create/mergerepo
> > in-process, since we know python and mod_python have garbage-collection
> > issues, but that may be a "cross the bridge when we come to it" problem.
> > Seth, is it feasible to provide an API to mergerepo that we could use
> > directly?
>
> I don't think I even saw a reply from Seth on this. Where does the
> mergerepo code stand now?
It has been written by Seth, I just need to test it. The tool currently
has command-line flags to do everything we need it to do (I believe) but
we could also use it as an example to use the api directly.
--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list
|
|
|
All times are GMT. The time now is 02:40 AM.
VBulletin, Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org
|