FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.

» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora Infrastructure

LinkBack Thread Tools
Old 05-12-2012, 01:51 AM
Matt Domsch
Default HOTFIX: MM crawler (ticket #3268)

notes that a mirror might not be removed from the list even though
it's stale.

In particular, there is a code path called add_parents() whose job it
is to mark all parent directories of a target directory up-to-date or
not, if those parent directories had not already been determined to be
up-to-date for themselves. This can happen if a directory has no
files in it, for example, only child directories. This code path had
an incorrect key lookup, specifically:

- parent = '/'.join(splitpath[:-1])
- try:
- hcd = host_category_dirs[(hc, parent)]

which was looking up the parent directory in the host_category_dirs
cache (which is later operated on). However, the actual key here is
not a the string form of the parent directory name, it is a Directory
object. So it's looking up the wrong thing, failing the lookup, and
then proceeding to mark all its parent directories up-to-date
incorrectly. In particular, it is marking all parent directories
up-to-date (e.g. pub/epel/5/i386) when a child subdirectory
(pub/epel/5/i386/repoview/layout) is marked up-to-date, even if the
parent directory is not in fact up-to-date.

The patch below fixes this by splitting out the parent directory
lookup function into its own function for readability, and fixes the key

I've tested this on bapp02 against a stale mirror that was previously
marked up-to-date incorrectly, and it fixes it.

I'd like to hotfix bapp02 to address this.


Matt Domsch
Technology Strategist
Dell | Office of the CTO

--- crawler_perhost 2010-09-06 14:46:21.000000000 +0000
+++ crawler_perhost 2012-05-12 01:20:54.604906708 +0000
@@ -348,21 +348,24 @@
return pref

-def add_parents(host_category_dirs, hc, d):
- splitpath = d.name.split('/')
+def parent(directory):
+ parentDir = None
+ splitpath = directory.name.split(u'/')
if len(splitpath[:-1]) > 0:
- parent = '/'.join(splitpath[:-1])
+ parentPath = u'/'.join(splitpath[:-1])
- hcd = host_category_dirs[(hc, parent)]
- except KeyError:
- try:
- parentDir = Directory.byName(parent)
- host_category_dirs[(hc, parentDir)] = True
- except SQLObjectNotFound: # recursed out of the directory structure
- parentDir = None
- if parentDir and parentDir != hc.category.topdir: # stop at top of the category
+ parentDir = Directory.byName(parentPath)
+ except SQLObjectNotFound:
+ pass
+ return parentDir
+def add_parents(host_category_dirs, hc, d):
+ parentDir = parent(d)
+ if parentDir is not None:
+ if (hc, parentDir) not in host_category_dirs:
+ print "directory %s adding parent %s, unknown up2date state" % (d.name, (hc, parentDir))
+ host_category_dirs[(hc, parentDir)] = None
+ if parentDir != hc.category.topdir: # stop at top of the category
return add_parents(host_category_dirs, hc, parentDir)

return host_category_dirs
infrastructure mailing list

Thread Tools

All times are GMT. The time now is 05:38 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org