FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Portage Developer

 
 
LinkBack Thread Tools
 
Old 02-10-2009, 09:53 AM
Douglas Anderson
 
Default equery: RFC and code review

Hi all,

It's been a few months since we first discussed refactoring equery on
this list and I think made pretty good time and I'm reasonably happy
with it. So I'm throwing out to you all. I've tried to keep a running
list of visible changes as I went along, but I'm sure I've missed some
things. You can read that here:

http://code.google.com/p/genscripts/source/browse/equery/trunk/equery/CHANGELOG

My primary goals for the refactorization were:
* Modularize
* Clean up the UI
* Rewrite a more solid option handler
* Eliminate as much duplicated code as possible
* Clean up slop and bitrot, follow clean coding standards and style
guidelines wherever possible
* Break up large functions, write docstrings for all functions
* Get code set for Python 2.6 (tested in 2.5 and 2.6) and prepare
smooth upgrade for Py3k.
* Create a more consistent experience for the user.

This last point is surely where the most contention will stem from, so
let me argue my position. All modules except for list, check and size
display the help message when called without input (ex: equery uses).
This makes sense. It's what most other programs do. It's what emerge
does. It's what eselect does. It's the neighborly thing to do.

However these three exceptions go and start a time-consuming process
which is of very limited usefulness. With a cold cache, each process
takes at least 30 seconds. I imagine that 99 percent of people who
type 'equery list' do it by accident. And who needs to see the size of
every installed package? And checksum every installed package? Almost
useless except to a select few people who know what they're doing.

So I took the liberty of changing the default behavior of these three
modules to be more in line with the rest of equery and the world. Each
of the three displays deprecation warning explaining how to recreate
the old default behavior. I can't imagine anyone objecting outright to
this, considering there will always be more than one version of
gentoolkit in the tree (there are 6 now) so it'd be a long time before
anyone was forced into the new behavior.

That's it. For anyone interested, please read the link above and let
me know if you like it/hate it/want something else done.

For python programmers, I'd appreciate a quick code review and any
comments/criticism whatsoever (I really like criticism).

Thanks in advance,
-Doug
 
Old 02-12-2009, 06:10 AM
Alec Warner
 
Default equery: RFC and code review

On Tue, Feb 10, 2009 at 2:53 AM, Douglas Anderson <dja@gendja.com> wrote:
> Hi all,
>
> It's been a few months since we first discussed refactoring equery on
> this list and I think made pretty good time and I'm reasonably happy
> with it. So I'm throwing out to you all. I've tried to keep a running
> list of visible changes as I went along, but I'm sure I've missed some
> things. You can read that here:
>
> http://code.google.com/p/genscripts/source/browse/equery/trunk/equery/CHANGELOG
>
> My primary goals for the refactorization were:
> * Modularize
> * Clean up the UI
> * Rewrite a more solid option handler
> * Eliminate as much duplicated code as possible
> * Clean up slop and bitrot, follow clean coding standards and style
> guidelines wherever possible
> * Break up large functions, write docstrings for all functions
> * Get code set for Python 2.6 (tested in 2.5 and 2.6) and prepare
> smooth upgrade for Py3k.
> * Create a more consistent experience for the user.
>
> This last point is surely where the most contention will stem from, so
> let me argue my position. All modules except for list, check and size
> display the help message when called without input (ex: equery uses).
> This makes sense. It's what most other programs do. It's what emerge
> does. It's what eselect does. It's the neighborly thing to do.
>
> However these three exceptions go and start a time-consuming process
> which is of very limited usefulness. With a cold cache, each process
> takes at least 30 seconds. I imagine that 99 percent of people who
> type 'equery list' do it by accident. And who needs to see the size of
> every installed package? And checksum every installed package? Almost
> useless except to a select few people who know what they're doing.
>
> So I took the liberty of changing the default behavior of these three
> modules to be more in line with the rest of equery and the world. Each
> of the three displays deprecation warning explaining how to recreate
> the old default behavior. I can't imagine anyone objecting outright to
> this, considering there will always be more than one version of
> gentoolkit in the tree (there are 6 now) so it'd be a long time before
> anyone was forced into the new behavior.
>
> That's it. For anyone interested, please read the link above and let
> me know if you like it/hate it/want something else done.
>
> For python programmers, I'd appreciate a quick code review and any
> comments/criticism whatsoever (I really like criticism).

You Asked...

I think one of the major things that annoyed me when reading the code
is that you did add a new option parser...and you added it about a
dozen times...(regexp used to basically find the start of the same
code block over and over).

antarus@kyoto ~/code/checkouts/genscripts-read-only/equery $ grep
"opts = .x" *.py | wc -l
13

Now the option parsing is quite complicated so I'm not sure if a
complicated data structure to pass to a generic function is really a
good use of ones time. But it stands out.

None of your files has a copyright notice.

I pray pp.print_error prints to stderr and not stdout; otherwise tends
to be a big nitpick of mine when people write errors to stdout.

belongs.py has this gem:
q.append(('^' if query[0] == '/' else '/') + re.escape(query) + '$')

I realize it works; but honestly, not readable.

append_str = ""
if query[0] == '/' append_str += '^' else append_str += '/'
append_str += re.escape(query) + '$'
q.append(append_str)

Slightly better (if append_str has a shorter name, maybe). I don't
like 1 line if EXPR STATEMENT else STATEMENT but sometimes they are
shorter
Putting a conditional in the function call is horrible looking though.

You use # FOO comments often; sometimes too often. Explaining what
you are doing is typically better noted in a docstring for a function
and not necessarily right next to the code it pertains to. An example
would be meta.py in get_useflags. You have a 4 line comment block
describing what you are doing but it really interferes with the
reading of the code. Functions should typically be short enough that
referring to the docstring is no big deal (and the example I chose is
a very short function). In addition you have a lot of "# do something
obvious here." If it is obvious; you don't need a comment. An
example of this would be meta.py around line 418.

in __init__.py your find_matches function is too long (180 lines or
4.5 of my screen heights). You've already split it up into convenient
sections by comments (which are fine by the way) but you can probably
make subroutines out of some of the subsections. Also most of the
code has nothing to do with finding matches and lots to do with query
preparation.

in __init__.py you have InvalidRegex with a TODO to move it to another
file. It seems like exceptions.py might be a worthy place.

I think there are a number of cases where I'd prefer to see string
interpolation instead of explicit addition:

'%s/%s' % (cat, package)

But thats just me being picky (and you use string interpolation often,
which is good).

Thats what I found in about 20 minutes of looking.

-Alec

>
> Thanks in advance,
> -Doug
>
>
 
Old 02-12-2009, 07:19 AM
Brian Harring
 
Default equery: RFC and code review

On Wed, Feb 11, 2009 at 11:10:43PM -0800, Alec Warner wrote:
> belongs.py has this gem:
> q.append(('^' if query[0] == '/' else '/') + re.escape(query) + '$')
Also a python2.6 only feature...
~brian
 
Old 02-12-2009, 09:01 AM
René 'Necoro' Neumann
 
Default equery: RFC and code review

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brian Harring schrieb:
> On Wed, Feb 11, 2009 at 11:10:43PM -0800, Alec Warner wrote:
>> belongs.py has this gem:
>> q.append(('^' if query[0] == '/' else '/') + re.escape(query) + '$')
> Also a python2.6 only feature...
> ~brian

python-2.5 it is ...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmT83cACgkQ4UOg/zhYFuAnUQCeJs+JtVjJEs83xbffyEZTV7DE
j/MAnRo0hIynhEZUdrtF2FzJiW1Z3nB7
=zTDf
-----END PGP SIGNATURE-----
 
Old 02-12-2009, 09:25 AM
Douglas Anderson
 
Default equery: RFC and code review

On Thu, Feb 12, 2009 at 4:10 PM, Alec Warner <antarus@gentoo.org> wrote:
> I think one of the major things that annoyed me when reading the code
> is that you did add a new option parser...and you added it about a
> dozen times...(regexp used to basically find the start of the same
> code block over and over).
>
> antarus@kyoto ~/code/checkouts/genscripts-read-only/equery $ grep
> "opts = .x" *.py | wc -l
> 13
>
> Now the option parsing is quite complicated so I'm not sure if a
> complicated data structure to pass to a generic function is really a
> good use of ones time. But it stands out.

Actually, I think you'll find that the amount of duplicated opt
parsing code in each module is really minimal, essentially only that
one line that your regex matched. Each module has its own set of
options and handles those options differently, so putting it all in
__init__ would be a royal mess.

I can see how at a glance it would appear like duplicated code, but
check out the different between normal getopt and gnu_getopt and how I
have those working between __init__ and the modules. There's nothing
being parsed twice. If you can come up with a simpler, cleaner way,
write back with a little more detail. But I just reexamined it and
even tried moving some stuff around, and I think it's about as good as
it gets right now.

>
> None of your files has a copyright notice.
>

There's one in equery.py. The rest of equery will reside in
python/site-packages. Does each one need the same copyright notice? (I
honestly don't know the policy)

> I pray pp.print_error prints to stderr and not stdout; otherwise tends
> to be a big nitpick of mine when people write errors to stdout.

$ grep -A 2 'def print_error' /usr/lib/gentoolkit/pym/gentoolkit/pprinter.py
def print_error(s):
"""Prints an error string to stderr."""
sys.stderr.write(output.red("!!! ") + s + "
")

Safe

> belongs.py has this gem:
> q.append(('^' if query[0] == '/' else '/') + re.escape(query) + '$')
>
> I realize it works; but honestly, not readable.

You're right, that's pretty nasty. But look what it came from!

q = map(lambda x: ((len(x) and x[0] == "/") and "^" or "/") +
re.escape(x) + "$", query)

But I agree that it can be further improved. How's this:

if query.startswith('/'):
query = "^%s$" % re.escape(query)
else:
query = "/%s$" % re.escape(query)
q.append(query)

> You use # FOO comments often; sometimes too often.

Yep, good point. I comment a little excessively, and believe it or not
emeta (now the meta module) was my first ever python script, so it had
lots of "obvious" comments. I'll try to get rid of the silly stuff as
I run into them again.

> in __init__.py your find_matches function is too long (180 lines or
> 4.5 of my screen heights). You've already split it up into convenient
> sections by comments (which are fine by the way) but you can probably
> make subroutines out of some of the subsections.

Good idea. I'll do that.

> in __init__.py you have InvalidRegex with a TODO to move it to another
> file. It seems like exceptions.py might be a worthy place.

Yep, not 100% done yet and modifying more blocks to use exceptions,
and then moving the exceptions out to exceptions.py is on my list of
TODOs.

> I think there are a number of cases where I'd prefer to see string
> interpolation instead of explicit addition:
>
> '%s/%s' % (cat, package)
>
> But thats just me being picky (and you use string interpolation often,
> which is good).

Explicit string catting is also one of my pet peeves, but py-2.6 is
getting a brand new string formatting syntax and "%" style
interpolation will be deprecated in 3k, so I only rewrote strings
where readability was suffering. I'm biding my time... but I
definitely agree with you here.

>
> Thats what I found in about 20 minutes of looking.
>
> -Alec

Muchly appreciated!
-Doug
 
Old 02-12-2009, 09:39 AM
Douglas Anderson
 
Default equery: RFC and code review

On Thu, Feb 12, 2009 at 7:01 PM, René 'Necoro' Neumann <lists@necoro.eu> wrote:
> Hash: SHA1
>
> Brian Harring schrieb:
>> On Wed, Feb 11, 2009 at 11:10:43PM -0800, Alec Warner wrote:
>>> belongs.py has this gem:
>>> q.append(('^' if query[0] == '/' else '/') + re.escape(query) + '$')
>> Also a python2.6 only feature...
>> ~brian
>
> python-2.5 it is ...

Yep, this python ternary syntax was introduced in 2.5. I'm using quite
a few >=2.5 features...
* the new ternary syntax
* passing a tuple to str.startswith
* with statement
* max( key=...)

Again, there are a lot of version of gentoolkit in the tree, so I
didn't foresee this being a problem.

-Doug
 
Old 02-12-2009, 10:27 AM
"Marijn Schouten (hkBst)"
 
Default equery: RFC and code review

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alec Warner wrote:
> On Tue, Feb 10, 2009 at 2:53 AM, Douglas Anderson <dja@gendja.com> wrote:
<snip>
> belongs.py has this gem:
> q.append(('^' if query[0] == '/' else '/') + re.escape(query) + '$')
>
> I realize it works; but honestly, not readable.
>
> append_str = ""
> if query[0] == '/' append_str += '^' else append_str += '/'
> append_str += re.escape(query) + '$'
> q.append(append_str)
>
> Slightly better (if append_str has a shorter name, maybe). I don't
> like 1 line if EXPR STATEMENT else STATEMENT but sometimes they are
> shorter
> Putting a conditional in the function call is horrible looking though.

It's called functional programming. Maybe it takes a little getting used to, but
as your own example shows, it can cut your code base by a significant factor.

Marijn


- --
Sarcasm puts the iron in irony, cynicism the steel.

Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML
<http://www.gentoo.org/proj/en/lisp/>, #gentoo-{lisp,ml} on FreeNode
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmUB54ACgkQp/VmCx0OL2wIOQCfdd0U1TJ/GXxTwvV8vmMpW2u9
hfsAoIXkWvJP8sOGK4NstuXiCAgy5OIf
=qfq4
-----END PGP SIGNATURE-----
 

Thread Tools




All times are GMT. The time now is 09:47 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org