On Wed, May 11, 2011 at 6:12 AM, Jack Morgan <firstname.lastname@example.org> wrote:
> On 05/10/2011 01:13 PM, Jorge Manuel B. S. Vicetto wrote:
>> Another issue that was raised in the discussion with the arch teams,
>> even though it predates the arch teams resources thread as we've talked
>> about it on FOSDEM 2011 and even before, is getting more automatic
>> testing done on Gentoo.
>> I'm bcc'ing a few teams on this thread as it involves them and hopefully
>> might interest them as well.
>> Both Release Engineering and QA teams would like to have more automatic
>> testing to find breakages and to help track "when" things break and more
>> importantly *why* they break.
>> To avoid misunderstandings, we already have testing and even automated
>> testing being done on Gentoo. The "first line" of testing is done by
>> developers using repoman and or the PM's QA tools. We also have
>> individual developers and the QA team hopefully checking commits and
>> everyone testing their packages.
>> Furtermore, the current weekly automatic stage building has helped
>> identify some issues with the tree. The tinderbox work done by Patrick
>> and Diego, as well as others, has also helped finding broken packages
>> and or identifying packages affected by major changes before they hit
>> the tree. The use of repoman, pcheck and or paludis quality assurance
>> tools in the past and present to generate reports about tree issues,
>> like Michael's (mr_bones) emails have also helped identifying and
>> addressing issues.
>> Recently, we've got a new site to check the results of some tests
>> http://qa-reports.gentoo.org/ with the possibility to add more scripts
>> to provide / run even more tests.
>> So, why "more testing"? For starters, more *automatic* testing. Then
>> more testing as reports from testing can help greatly in identifying
>> when things break and why they break. As someone that looks over the
>> automatic stage building for amd64 and x86, and that has to talk to
>> teams / developers when things break, having more, more in depth and
>> regular automatic testing would help my (releng) job. The work for the
>> live-dvd would also be easier if the builds were "automated" and the job
>> wasn't "restarted" every N months. Furthermore, creating a framework for
>> developers to be able to schedule testing for proposed changes, in
>> particular for substantial changes, might (should?) help improve the
>> user's experience.
>> I hope you agree with "more testing" by now, but what testing? It's good
>> to test something, but "what" do we want to test and "how" do we want to
>> I think we should try to have at least the following categories of tests:
>> ** Portage (overlays?) QA tests
>> * * * tests with the existing QA tools to check the consistency of
>> dependencies and the quality of ebuilds / eclasses.
These are almost separate. I assume your intent was 'lets automate
pcheck & co. runs of gentoo-x86 and if we get that working we can add
overlays from layman' which sounds fine to me
>> ** (on demand?) package (stable / unstable) revdep rebuild (tinderbox)
>> * * * framework to schedule testing of proposed changes and check their impact
I'd be curious what the load is here. We are adopting an on-demand
testing infrastructure at work. Right now we have a continuous build
but it is time-delta based and not event-based so it groups changes
together which makes it hard to find what broke things. At work we
only submit a few changes a day though, so we need a very small
infrastructure to test each change. Gentoo has way more commits (at
least one every few minutes on average, and then there are huge
commits like KDE stablization...)
What I'd recommend here is essentially some kind of control field in
the commit itself (commitmsg?) that controls exactly what tests are
done for that commit.
>> ** Weekly (?) stable / unstable stage / ISO arch builds
>> * * * the automatic stage building, including new specs for the testing tree
>> as we currently only test the stable tree.
I'm curious if you constantly build unstable..do you plan on fixing
it? My understanding of Gentoo is that in ~arch something is always
slightly broken and thats OK. I worry that ~arch builds may just end
up being noise because they don't build properly due to the high
velocity of changes.
>> ** (schedule?) specific tailored stage4 builds
>> * * * testing of specific tailored "real world" images (web server, intranet
>> server, generic desktop, GNOME desktop, KDE desktop, etc).
Again it would be interesting to have some kind of control field in my
commits so when KDE is stable I could trigger a build of the 'KDE
stage4' or whatnot.
If we ever finish this gentoo-stats project it would be interesting to
see what users are actually using as well. Do users use the defaults?
Are the stage4's we are testing actually relevant?
>> ** Bi-Weekly (?) stable / unstable AMD64/X86 LiveDVD builds
>> * * * automatic creation of the live-DVD to test a very broad set of packages
>> ** automated testing of built stage / CD / LiveDVD (KVM guest?) (CLI /
>> GUI / log parsing ?)
>> * * * framework to test the built stages / install media and ensure it works
>> as expected
I think testing that the liveDVD we just built boots is a decent test
(and probably not to difficult to write.) Testing that 'everything on
the DVD works' is likely more of a challenge and I'm not sure it buys
us anything. Do we often find that we release LiveDVDs with broken
>> I don't have a framework for conducting some of these tests, including
>> the stage/iso validation, but some of them can use the existing tools
>> like the stage building and the tree QA tests.
>> Do you have any suggestions about the automatic testing? Do you know of
>> other tests or tools that we can and should use to improve QA on Gentoo?
> You might take a look at autotest from kernel.org. It's a Python based
> framework for automating testing. It's specific towards kernel testing,
> but could be modified for your needs.
Autotest would likely require a branch and a fair bit of work to be
used for OS qualification. We use it for OS qualification at work
While I hesitate to say 'roll your own' if you can get something
working in 1-2 months I can certainly see it being easier to maintain
than autotest...there really is not a killer feature that autotest
has. The reporting / graphing is pretty bad, it uses ssh for
everything and basically keeps long-running connections open (might be
fine if you are using kvm..but not over the WAN), the API is terrible
and requires all kinds of horrible-ness to use...I could go on ;
> Jack Morgan
> Pub 4096R/761D8E0A 2010-09-13 Jack Morgan <email@example.com>
> Fingerprint = DD42 EA48 D701 D520 C2CD 55BE BF53 C69B 761D 8E0A