FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux Development

 
 
LinkBack Thread Tools
 
Old 09-12-2008, 08:23 PM
"Dusty Phillips"
 
Default Problem with web dashboard: massive orphaning of packages

2008/9/12 Eric Belanger <belanger@astro.umontreal.ca>:
> Hi,
>
> I don't know if you remember but a while ago a huge part of extra i686 (IIRC
> it was all packages from L to Z) were orphaned and erroneouly showing up as
> recently updated on the web site. This just happened again with packages in
> extra x86_64. I don't know what could caused that but it's very annoying as
> we has to readopt all our packages back.

Fuck.

I remember Judd telling me not to swear at users but its ok to swear
at scripts right?

This has to be happening in reporead.py. Fucking reporead.py. To the
best of my knowledge, no other script updates the web database in
anyway, am I wrong?


The actual db_update script splits the packages into those that are in
the database and those that are not and processes them separately.
Packages that are not currently in the database get added as orphans
because apparently its hard to interrogate the maintainer from the
db.tar.gz. At first, I assumed that it is doing an add when it should
be doing an update, which would add new packages with orphan
maintainer. But this doesn't appear to be the case because there are
not currently any duplicate x86_64 packages (that aren't in testing).

My second more likely hypothesis is race conditions. I don't know how
the db scripts update exactly, but I suspect reporead is reading a
db.tar.gz file that is either broken or not yet fully uploaded. It
sees this broken db file and drops all the packages in the web
interface that are not in that file. Then x minutes later (crontab),
it runs again on a proper db and sees the missing packages again. It
adds them to the database and sets the maintainer to orphan.

Are such broken dbs possible/likely/happening? If its a race
condition, we need to put a lock on the database (maybe dbtools does
this already) so that reporead isn't accessing it at the same time as
dbtools. If its just that when the database gets updated it sometimes
breaks the database well.. that just needs to be fixed.

Dusty
 
Old 09-12-2008, 08:55 PM
"Aaron Griffin"
 
Default Problem with web dashboard: massive orphaning of packages

On Fri, Sep 12, 2008 at 3:23 PM, Dusty Phillips <buchuki@gmail.com> wrote:
> 2008/9/12 Eric Belanger <belanger@astro.umontreal.ca>:
>> Hi,
>>
>> I don't know if you remember but a while ago a huge part of extra i686 (IIRC
>> it was all packages from L to Z) were orphaned and erroneouly showing up as
>> recently updated on the web site. This just happened again with packages in
>> extra x86_64. I don't know what could caused that but it's very annoying as
>> we has to readopt all our packages back.
>
> Fuck.
>
> I remember Judd telling me not to swear at users but its ok to swear
> at scripts right?
>
> This has to be happening in reporead.py. Fucking reporead.py. To the
> best of my knowledge, no other script updates the web database in
> anyway, am I wrong?
>
>
> The actual db_update script splits the packages into those that are in
> the database and those that are not and processes them separately.
> Packages that are not currently in the database get added as orphans
> because apparently its hard to interrogate the maintainer from the
> db.tar.gz. At first, I assumed that it is doing an add when it should
> be doing an update, which would add new packages with orphan
> maintainer. But this doesn't appear to be the case because there are
> not currently any duplicate x86_64 packages (that aren't in testing).
>
> My second more likely hypothesis is race conditions. I don't know how
> the db scripts update exactly, but I suspect reporead is reading a
> db.tar.gz file that is either broken or not yet fully uploaded. It
> sees this broken db file and drops all the packages in the web
> interface that are not in that file. Then x minutes later (crontab),
> it runs again on a proper db and sees the missing packages again. It
> adds them to the database and sets the maintainer to orphan.
>
> Are such broken dbs possible/likely/happening? If its a race
> condition, we need to put a lock on the database (maybe dbtools does
> this already) so that reporead isn't accessing it at the same time as
> dbtools. If its just that when the database gets updated it sometimes
> breaks the database well.. that just needs to be fixed.

Hmmm, the DBs are constructed in /tmp and then moved live to
/home/ftp/whatever it's possible that reporead may be opening it
mid-move, but that doesn't seem right. It's gzipped. Wouldn't that
balk if you took half of a DB file, and tried to gunzip it?
 
Old 09-12-2008, 08:57 PM
"Dan McGee"
 
Default Problem with web dashboard: massive orphaning of packages

On Fri, Sep 12, 2008 at 3:23 PM, Dusty Phillips <buchuki@gmail.com> wrote:
> 2008/9/12 Eric Belanger <belanger@astro.umontreal.ca>:
>> Hi,
>>
>> I don't know if you remember but a while ago a huge part of extra i686 (IIRC
>> it was all packages from L to Z) were orphaned and erroneouly showing up as
>> recently updated on the web site. This just happened again with packages in
>> extra x86_64. I don't know what could caused that but it's very annoying as
>> we has to readopt all our packages back.
>
> Fuck.
>
> I remember Judd telling me not to swear at users but its ok to swear
> at scripts right?
>
> This has to be happening in reporead.py. Fucking reporead.py. To the
> best of my knowledge, no other script updates the web database in
> anyway, am I wrong?
>
>
> The actual db_update script splits the packages into those that are in
> the database and those that are not and processes them separately.
> Packages that are not currently in the database get added as orphans
> because apparently its hard to interrogate the maintainer from the
> db.tar.gz. At first, I assumed that it is doing an add when it should
> be doing an update, which would add new packages with orphan
> maintainer. But this doesn't appear to be the case because there are
> not currently any duplicate x86_64 packages (that aren't in testing).
>
> My second more likely hypothesis is race conditions. I don't know how
> the db scripts update exactly, but I suspect reporead is reading a
> db.tar.gz file that is either broken or not yet fully uploaded. It
> sees this broken db file and drops all the packages in the web
> interface that are not in that file. Then x minutes later (crontab),
> it runs again on a proper db and sees the missing packages again. It
> adds them to the database and sets the maintainer to orphan.
>
> Are such broken dbs possible/likely/happening? If its a race
> condition, we need to put a lock on the database (maybe dbtools does
> this already) so that reporead isn't accessing it at the same time as
> dbtools. If its just that when the database gets updated it sometimes
> breaks the database well.. that just needs to be fixed.

This would be a hell of a race condition- to make a database, we first
unzip it to a temp location, make our changes and updates, and then
rezip it. Thus reporead.py would have to open the db while it is being
zipped, which is a very short period of time, but I guess
theoretically possible.

WIthout looking at the repo-add code, I don't know if we do this now,
but we probably should:
1. unzip the db to a temp location
2. make changes
3. rezip it to db.tar.gz.new
4. move old db to db.tar.gz.old
5 move new db to db.tar.gz

This would make the "db replacement" portion atomic in the sense that
we would never have a partial DB; we would only have a short period of
time where no db existed in that location. If really necessary we
could avoid even this by copying the old db to one with the old
extension instead of moving it.

-Dan
 
Old 09-12-2008, 09:19 PM
"Aaron Griffin"
 
Default Problem with web dashboard: massive orphaning of packages

On Fri, Sep 12, 2008 at 3:57 PM, Dan McGee <dpmcgee@gmail.com> wrote:
> On Fri, Sep 12, 2008 at 3:23 PM, Dusty Phillips <buchuki@gmail.com> wrote:
>> 2008/9/12 Eric Belanger <belanger@astro.umontreal.ca>:
>>> Hi,
>>>
>>> I don't know if you remember but a while ago a huge part of extra i686 (IIRC
>>> it was all packages from L to Z) were orphaned and erroneouly showing up as
>>> recently updated on the web site. This just happened again with packages in
>>> extra x86_64. I don't know what could caused that but it's very annoying as
>>> we has to readopt all our packages back.
>>
>> Fuck.
>>
>> I remember Judd telling me not to swear at users but its ok to swear
>> at scripts right?
>>
>> This has to be happening in reporead.py. Fucking reporead.py. To the
>> best of my knowledge, no other script updates the web database in
>> anyway, am I wrong?
>>
>>
>> The actual db_update script splits the packages into those that are in
>> the database and those that are not and processes them separately.
>> Packages that are not currently in the database get added as orphans
>> because apparently its hard to interrogate the maintainer from the
>> db.tar.gz. At first, I assumed that it is doing an add when it should
>> be doing an update, which would add new packages with orphan
>> maintainer. But this doesn't appear to be the case because there are
>> not currently any duplicate x86_64 packages (that aren't in testing).
>>
>> My second more likely hypothesis is race conditions. I don't know how
>> the db scripts update exactly, but I suspect reporead is reading a
>> db.tar.gz file that is either broken or not yet fully uploaded. It
>> sees this broken db file and drops all the packages in the web
>> interface that are not in that file. Then x minutes later (crontab),
>> it runs again on a proper db and sees the missing packages again. It
>> adds them to the database and sets the maintainer to orphan.
>>
>> Are such broken dbs possible/likely/happening? If its a race
>> condition, we need to put a lock on the database (maybe dbtools does
>> this already) so that reporead isn't accessing it at the same time as
>> dbtools. If its just that when the database gets updated it sometimes
>> breaks the database well.. that just needs to be fixed.
>
> This would be a hell of a race condition- to make a database, we first
> unzip it to a temp location, make our changes and updates, and then
> rezip it. Thus reporead.py would have to open the db while it is being
> zipped, which is a very short period of time, but I guess
> theoretically possible.
>
> WIthout looking at the repo-add code, I don't know if we do this now,
> but we probably should:
> 1. unzip the db to a temp location
> 2. make changes
> 3. rezip it to db.tar.gz.new
> 4. move old db to db.tar.gz.old
> 5 move new db to db.tar.gz
>
> This would make the "db replacement" portion atomic in the sense that
> we would never have a partial DB; we would only have a short period of
> time where no db existed in that location. If really necessary we
> could avoid even this by copying the old db to one with the old
> extension instead of moving it.

Well, all the repo-add stuff is done in a subdir of /tmp too, then
it's simply 'mv'ed to /home/ftp, so it *should* be fairly atomic...
well, it would be if /tmp was on the same filesystem - it's just a
matter of moving inodes
 
Old 09-12-2008, 09:40 PM
"Dusty Phillips"
 
Default Problem with web dashboard: massive orphaning of packages

2008/9/12 Aaron Griffin <aaronmgriffin@gmail.com>:
> On Fri, Sep 12, 2008 at 3:23 PM, Dusty Phillips <buchuki@gmail.com> wrote:
>> 2008/9/12 Eric Belanger <belanger@astro.umontreal.ca>:
>>> Hi,
>>>
>>> I don't know if you remember but a while ago a huge part of extra i686 (IIRC
>>> it was all packages from L to Z) were orphaned and erroneouly showing up as
>>> recently updated on the web site. This just happened again with packages in
>>> extra x86_64. I don't know what could caused that but it's very annoying as
>>> we has to readopt all our packages back.
>>
>> Fuck.
>>
>> I remember Judd telling me not to swear at users but its ok to swear
>> at scripts right?
>>
>> This has to be happening in reporead.py. Fucking reporead.py. To the
>> best of my knowledge, no other script updates the web database in
>> anyway, am I wrong?
>>
>>
>> The actual db_update script splits the packages into those that are in
>> the database and those that are not and processes them separately.
>> Packages that are not currently in the database get added as orphans
>> because apparently its hard to interrogate the maintainer from the
>> db.tar.gz. At first, I assumed that it is doing an add when it should
>> be doing an update, which would add new packages with orphan
>> maintainer. But this doesn't appear to be the case because there are
>> not currently any duplicate x86_64 packages (that aren't in testing).
>>
>> My second more likely hypothesis is race conditions. I don't know how
>> the db scripts update exactly, but I suspect reporead is reading a
>> db.tar.gz file that is either broken or not yet fully uploaded. It
>> sees this broken db file and drops all the packages in the web
>> interface that are not in that file. Then x minutes later (crontab),
>> it runs again on a proper db and sees the missing packages again. It
>> adds them to the database and sets the maintainer to orphan.
>>
>> Are such broken dbs possible/likely/happening? If its a race
>> condition, we need to put a lock on the database (maybe dbtools does
>> this already) so that reporead isn't accessing it at the same time as
>> dbtools. If its just that when the database gets updated it sometimes
>> breaks the database well.. that just needs to be fixed.
>
> Hmmm, the DBs are constructed in /tmp and then moved live to
> /home/ftp/whatever it's possible that reporead may be opening it
> mid-move, but that doesn't seem right. It's gzipped. Wouldn't that
> balk if you took half of a DB file, and tried to gunzip it?


I think so... not sure if this is a proper test of it but it fails:

dusty:x86_64 $ head -c 10000 extra.db.tar.gz | tar -xz

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

reporead does some great stuff with logger (debug and info). Do you
know if any of those logged messages are saved?

I haven't checked this time, but IIRC last time, it was all packages
after the letter L that got orphaned or something. This indicates that
for some reason reporead is not processing all the packages in the
file. Either the db does not contain all the files because a half
full db got uploaded or it is reading part of the db and then exiting
for some reason. Why either of these would occur is beyond me.

Dusty
 
Old 09-12-2008, 10:13 PM
"Aaron Griffin"
 
Default Problem with web dashboard: massive orphaning of packages

On Fri, Sep 12, 2008 at 4:40 PM, Dusty Phillips <buchuki@gmail.com> wrote:
> reporead does some great stuff with logger (debug and info). Do you
> know if any of those logged messages are saved?

They're stored in /tmp/archweb_update.log and emailed to me once a
day. This is all done in the cron script located at
/etc/cron.hourly/update_web_db.sh

Looking at it, I noticed lots of this:

2008-09-12 18:02:38 -> INFO: Finished repo parsing
2008-09-12 18:02:38 -> INFO: Starting database updates.
2008-09-12 18:02:38 -> INFO: Updating Arch: x86_64
2008-09-12 18:02:47 -> INFO: Finished updating Arch: x86_64
2008-09-12 18:02:47 -> INFO: Updating Arch: i686
2008-09-12 18:02:47 -> INFO: Removing package kde-l10n-ca from database
2008-09-12 18:02:47 -> INFO: Removing package xalan-java from database
2008-09-12 18:02:47 -> INFO: Removing package fcgi from database
2008-09-12 18:02:47 -> INFO: Removing package enblend-enfuse from database
2008-09-12 18:02:47 -> INFO: Removing package netcdf from database
2008-09-12 18:02:47 -> INFO: Removing package mirage from database
2008-09-12 18:02:47 -> INFO: Removing package glhack from database
..... lots and lots of "Removing package" lines ....
I wonder a) why those were removed and b) if that is related to the
x86_64 orphaning
 
Old 09-12-2008, 10:52 PM
"Aaron Griffin"
 
Default Problem with web dashboard: massive orphaning of packages

On Fri, Sep 12, 2008 at 5:11 PM, Thomas Bächler <thomas@archlinux.org> wrote:
> Dusty Phillips schrieb:
>>
>> I think so... not sure if this is a proper test of it but it fails:
>>
>> dusty:x86_64 $ head -c 10000 extra.db.tar.gz | tar -xz
>>
>> gzip: stdin: unexpected end of file
>> tar: Unexpected EOF in archive
>> tar: Unexpected EOF in archive
>> tar: Error is not recoverable: exiting now
>>
>> reporead does some great stuff with logger (debug and info). Do you
>> know if any of those logged messages are saved?
>
> debug and info level messages are saved in /var/log/everything.log.
>
> @Aaron, may I suggest that you add many (all?) devs to the "log" group so we
> can read the stuff in /var/log. It is often useful, as it would be now for
> Dusty.

I added a few - either people with sysadmin experience, or people who
check up on things like this regularly (i.e. you, Thomas)
 
Old 09-13-2008, 12:44 AM
"Dusty Phillips"
 
Default Problem with web dashboard: massive orphaning of packages

2008/9/12 Aaron Griffin <aaronmgriffin@gmail.com>:
> They're stored in /tmp/archweb_update.log and emailed to me once a
> day. This is all done in the cron script located at
> /etc/cron.hourly/update_web_db.sh

What about debug level messages?

> 2008-09-12 18:02:38 -> INFO: Finished repo parsing
> 2008-09-12 18:02:38 -> INFO: Starting database updates.
> 2008-09-12 18:02:38 -> INFO: Updating Arch: x86_64
> 2008-09-12 18:02:47 -> INFO: Finished updating Arch: x86_64
> 2008-09-12 18:02:47 -> INFO: Updating Arch: i686
> 2008-09-12 18:02:47 -> INFO: Removing package kde-l10n-ca from database
> 2008-09-12 18:02:47 -> INFO: Removing package xalan-java from database
> 2008-09-12 18:02:47 -> INFO: Removing package fcgi from database
> 2008-09-12 18:02:47 -> INFO: Removing package enblend-enfuse from database
> 2008-09-12 18:02:47 -> INFO: Removing package netcdf from database
> 2008-09-12 18:02:47 -> INFO: Removing package mirage from database
> 2008-09-12 18:02:47 -> INFO: Removing package glhack from database
> ..... lots and lots of "Removing package" lines ....
> I wonder a) why those were removed and b) if that is related to the
> x86_64 orphaning

b) is almost certainly yes. The packages get removed and then
presumably get added again later with orphan status. This must be
thoroughly fucking up the web interface new package notification.

a) is WTF. I just checked the current state of the db.tar.gz and they
seem to contain packages that reporead claims were removed. So it
doesn't look like anything is breaking the db.tar.gz. It seems more
like reporead is not reading the whole file. But its still possible
the db.tar.gz has been fixed since the error occurred.

I have added some logging info to say how many packages are currently
in the web db and how many are in the new sync db. If these are
disparate the problem is in the code that loads the repo.db.tar.gz.
Otherwise its in the code that adds/removes packages.

I also implemented a check to warn or exception if these numbers are
75% or 50%, as Paul suggested.

I don't have time to look for anything else right now, hopefully it
will keep happening so I can track it down.

Does somebody want to give me a quck rundown or wiki article of how
the database tools move packages from svn to release in
repo.db.tar.gz? I'm thinking if reporead wants to be this anal, maybe
we should add some hooks to whatever script says 'I just released a
package, please update the database' and sync up the web database at
the time things get updated.

Sorry I don't know what's causing this folks. I'm just praying its a
long standing bug and can blame it on cactus instead of having to come
back to y'all and say "well here's the thing, I introduced this really
really stupid bug into reporead.py....." ;-)

Dusty
 
Old 09-13-2008, 04:16 AM
"Aaron Griffin"
 
Default Problem with web dashboard: massive orphaning of packages

On Fri, Sep 12, 2008 at 7:44 PM, Dusty Phillips <buchuki@gmail.com> wrote:
> 2008/9/12 Aaron Griffin <aaronmgriffin@gmail.com>:
>> They're stored in /tmp/archweb_update.log and emailed to me once a
>> day. This is all done in the cron script located at
>> /etc/cron.hourly/update_web_db.sh
>
> What about debug level messages?

I'm fairly certain those *don't* go to the syslog, and they're all
output to the same script. Maybe the level can be adjusted (at the top
it looks like it sets something to WARNING).

>> 2008-09-12 18:02:38 -> INFO: Finished repo parsing
>> 2008-09-12 18:02:38 -> INFO: Starting database updates.
>> 2008-09-12 18:02:38 -> INFO: Updating Arch: x86_64
>> 2008-09-12 18:02:47 -> INFO: Finished updating Arch: x86_64
>> 2008-09-12 18:02:47 -> INFO: Updating Arch: i686
>> 2008-09-12 18:02:47 -> INFO: Removing package kde-l10n-ca from database
>> 2008-09-12 18:02:47 -> INFO: Removing package xalan-java from database
>> 2008-09-12 18:02:47 -> INFO: Removing package fcgi from database
>> 2008-09-12 18:02:47 -> INFO: Removing package enblend-enfuse from database
>> 2008-09-12 18:02:47 -> INFO: Removing package netcdf from database
>> 2008-09-12 18:02:47 -> INFO: Removing package mirage from database
>> 2008-09-12 18:02:47 -> INFO: Removing package glhack from database
>> ..... lots and lots of "Removing package" lines ....
>> I wonder a) why those were removed and b) if that is related to the
>> x86_64 orphaning
>
> b) is almost certainly yes. The packages get removed and then
> presumably get added again later with orphan status. This must be
> thoroughly fucking up the web interface new package notification.
>
> a) is WTF. I just checked the current state of the db.tar.gz and they
> seem to contain packages that reporead claims were removed. So it
> doesn't look like anything is breaking the db.tar.gz. It seems more
> like reporead is not reading the whole file. But its still possible
> the db.tar.gz has been fixed since the error occurred.
>
> I have added some logging info to say how many packages are currently
> in the web db and how many are in the new sync db. If these are
> disparate the problem is in the code that loads the repo.db.tar.gz.
> Otherwise its in the code that adds/removes packages.
>
> I also implemented a check to warn or exception if these numbers are
> 75% or 50%, as Paul suggested.
>
> I don't have time to look for anything else right now, hopefully it
> will keep happening so I can track it down.
>
> Does somebody want to give me a quck rundown or wiki article of how
> the database tools move packages from svn to release in
> repo.db.tar.gz? I'm thinking if reporead wants to be this anal, maybe
> we should add some hooks to whatever script says 'I just released a
> package, please update the database' and sync up the web database at
> the time things get updated.

That's actually what we tried to get away from by doing this. The old
DB scripts were so tightly coupled to gerolde, it was near impossible
to test them. We actually had binaries that did mysql work. I don't
want to go back to that way of doing things. This should all be as
decoupled as possible....

> Sorry I don't know what's causing this folks. I'm just praying its a
> long standing bug and can blame it on cactus instead of having to come
> back to y'all and say "well here's the thing, I introduced this really
> really stupid bug into reporead.py....." ;-)

I plan on looking into this on my saturday sprint too. I can do some
testing and maybe some improvements of reporead.py too. Should be
straightforward - setup a DB, grab the django code, wget the extra DB
file, and bam....

Anyone else willing to work with me on testing this one?
 
Old 09-13-2008, 02:24 PM
"Dusty Phillips"
 
Default Problem with web dashboard: massive orphaning of packages

2008/9/13 Daniel Isenmann <daniel.isenmann@gmx.de>:
> On Sat, 13 Sep 2008 01:12:46 +0200
> Alexander Fehr <pizzapunk@gmail.com> wrote:
>
>> The x86_64 orphaning has happened again some minutes ago. Moreover,
>> the i686 packages from extra are now completely gone from the web
>> interface.
>>
>> Alex
>
> I would suggest that no developer commit anything or update the db
> until this thing is fixed. The complete i686 extra repo is gone from
> the web like Alex said it.

This shouldn't be necessary; because of the decoupling Aaron
mentioned, the web view database will sync itself properly when its
fixed. People will be confused though, but they're going to be
confused anyway.

Dusty
 

Thread Tools




All times are GMT. The time now is 09:22 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org