Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Gentoo Portage Developer (http://www.linux-archive.org/gentoo-portage-developer/)
-   -   repoman: unroll escaped lines so we can check the entirety of it (http://www.linux-archive.org/gentoo-portage-developer/670598-repoman-unroll-escaped-lines-so-we-can-check-entirety.html)

Mike Frysinger 05-24-2012 04:33 PM

repoman: unroll escaped lines so we can check the entirety of it
 
On Thursday 24 May 2012 00:19:45 Zac Medico wrote:
> On 05/23/2012 09:06 PM, Mike Frysinger wrote:
> > Sometimes people wrap long lines in their ebuilds to make it easier to
> > read, but this causes us issues when doing line-by-line checking. So
> > automatically unroll those lines before passing the full content down
> > to our checkers.
> >
> > This seems to work, but maybe someone can suggest something simpler.
>
> This code should come right after the line that says "We're not in a
> here-document", because we only need it to trigger when we're not in a
> here-document.

i was thinking this would handle wrapped lines and heredocs together better,
but i'll ignore that until i can come up with a concrete case.

> I think it's going to be cleaner to detect an escaped newline with a
> regular expression, like r'(^|[^])$'.

the reason i didn't go the regex route is because this fails with:
echo foo \
cow
whereas letting python take care of all the escaping works much better
-mike

Mike Frysinger 05-24-2012 07:20 PM

repoman: unroll escaped lines so we can check the entirety of it
 
Sometimes people wrap long lines in their ebuilds to make it easier to
read, but this causes us issues when doing line-by-line checking. So
automatically unroll those lines before passing the full content down
to our checkers.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
v2
- re-order heredoc/multiline checking

pym/repoman/checks.py | 60 +++++++++++++++++++++++++++++++++++++++---------
1 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/pym/repoman/checks.py b/pym/repoman/checks.py
index c17a0bd..cd8d3d2 100644
--- a/pym/repoman/checks.py
+++ b/pym/repoman/checks.py
@@ -759,6 +759,7 @@ _ignore_comment_re = re.compile(r'^s*#')
def run_checks(contents, pkg):
checks = _constant_checks
here_doc_delim = None
+ multiline = None

for lc in checks:
lc.new(pkg)
@@ -772,19 +773,54 @@ def run_checks(contents, pkg):
here_doc = _here_doc_re.match(line)
if here_doc is not None:
here_doc_delim = re.compile(r'^s*%s$' % here_doc.group(1))
+ if here_doc_delim is not None:
+ continue
+
+ # Unroll multiline escaped strings so that we can check things:
+ # inherit foo bar
+ # moo
+ # cow
+ # This will merge these lines like so:
+ # inherit foo bar moo cow
+ try:
+ # A normal line will end in the two bytes: <> <
>. So decoding
+ # that will result in python thinking the <
> is being escaped
+ # and eat the single <> which makes it hard for us to detect.
+ # Instead, strip the newline (which we know all lines have), and
+ # append a <0>. Then when python escapes it, if the line ended
+ # in a <>, we'll end up with a <> marker to key off of. This
+ # shouldn't be a problem with any valid ebuild ...
+ line_escaped = (line.rstrip('
') + '0').decode('string_escape')
+ except:
+ # Who knows what kind of crazy crap an ebuild will have
+ # in it -- don't allow it to kill us.
+ line_escaped = line
+ if multiline:
+ # Chop off the and
bytes from the previous line.
+ multiline = multiline[:-2] + line
+ if not line_escaped.endswith(''):
+ line = multiline
+ num = multinum
+ multiline = None
+ else:
+ continue
+ else:
+ if line_escaped.endswith(''):
+ multinum = num
+ multiline = line
+ continue

- if here_doc_delim is None:
- # We're not in a here-document.
- is_comment = _ignore_comment_re.match(line) is not None
- for lc in checks:
- if is_comment and lc.ignore_comment:
- continue
- if lc.check_eapi(pkg.metadata['EAPI']):
- ignore = lc.ignore_line
- if not ignore or not ignore.match(line):
- e = lc.check(num, line)
- if e:
- yield lc.repoman_check_name, e % (num + 1)
+ # Finally we have a full line to parse.
+ is_comment = _ignore_comment_re.match(line) is not None
+ for lc in checks:
+ if is_comment and lc.ignore_comment:
+ continue
+ if lc.check_eapi(pkg.metadata['EAPI']):
+ ignore = lc.ignore_line
+ if not ignore or not ignore.match(line):
+ e = lc.check(num, line)
+ if e:
+ yield lc.repoman_check_name, e % (num + 1)

for lc in checks:
i = lc.end()
--
1.7.8.6

Zac Medico 05-24-2012 07:52 PM

repoman: unroll escaped lines so we can check the entirety of it
 
On 05/24/2012 12:20 PM, Mike Frysinger wrote:
> + # A normal line will end in the two bytes: <> <
>. So decoding
> + # that will result in python thinking the <
> is being escaped
> + # and eat the single <> which makes it hard for us to detect.
> + # Instead, strip the newline (which we know all lines have), and
> + # append a <0>. Then when python escapes it, if the line ended
> + # in a <>, we'll end up with a <> marker to key off of. This
> + # shouldn't be a problem with any valid ebuild ...
> + line_escaped = (line.rstrip('
') + '0').decode('string_escape')

That decode('string_escape') method won't work in python3, because the
str object doesn't have a decode method. I think something like this
will work with both python3 and python2:

import codecs

unicode_escape_codec = codecs.lookup('unicode_escape')

def unicode_escape(s):
return unicode_escape_codec(s)[0]

line_escaped = unicode_escape(line.rstrip('
') + '0')
--
Thanks,
Zac

Zac Medico 05-24-2012 08:08 PM

repoman: unroll escaped lines so we can check the entirety of it
 
On 05/24/2012 12:52 PM, Zac Medico wrote:
> On 05/24/2012 12:20 PM, Mike Frysinger wrote:
>> + # A normal line will end in the two bytes: <> <
>. So decoding
>> + # that will result in python thinking the <
> is being escaped
>> + # and eat the single <> which makes it hard for us to detect.
>> + # Instead, strip the newline (which we know all lines have), and
>> + # append a <0>. Then when python escapes it, if the line ended
>> + # in a <>, we'll end up with a <> marker to key off of. This
>> + # shouldn't be a problem with any valid ebuild ...
>> + line_escaped = (line.rstrip('
') + '0').decode('string_escape')
>
> That decode('string_escape') method won't work in python3, because the
> str object doesn't have a decode method. I think something like this
> will work with both python3 and python2:
>
> import codecs
>
> unicode_escape_codec = codecs.lookup('unicode_escape')
>
> def unicode_escape(s):
> return unicode_escape_codec(s)[0]

- return unicode_escape_codec(s)[0]
+ return unicode_escape_codec.decode(s)[0]

> line_escaped = unicode_escape(line.rstrip('
') + '0')


--
Thanks,
Zac

Kent Fredric 05-24-2012 09:43 PM

repoman: unroll escaped lines so we can check the entirety of it
 
On 25 May 2012 04:18, Mike Frysinger <vapier@gentoo.org> wrote:
>> If there is no such policy, and a forced text-wrap at 80 characters is
>> not needed, I would love for that setting to be removed.

Hmm, http://devmanual.gentoo.org/ebuild-writing/file-format/index.html#indenting-and-whitespace

does sort of imply a "keep it under 80 chars where possible", but vims
wrap settings don't really see it like that, vim is more "80
characters, if you can't get it under that, WRAP" .

http://devmanual.gentoo.org/ebuild-writing/variables/index.html#required-variables

Indicates "less than 80", but 79 characters + 14 for 'DESCRIPTION=""'
= 93 so if you're pushing the description limitation, vim will try
wrapping each and every time you type a new letter.

And repoman doesn't even warn till you hit a 100 character description
( col 114 )

So is there something that specifies this somewhere, or does that
required-variables and indenting-and-whitespace section need to be
made updated and made more clear, that

Description can be up to 100 ( not 80 ) and that the text-width is
merely a suggestion, not any sort of
defacto-but-hard-to-find-documented standard?

--
Kent

perl -e* "print substr( "edrgmaM* SPA NOcomil.ic@tfrken", $_ * 3,
3 ) for ( 9,8,0,7,1,6,5,4,3,2 );"

http://kent-fredric.fox.geek.nz

Mike Frysinger 05-25-2012 05:22 AM

repoman: unroll escaped lines so we can check the entirety of it
 
Sometimes people wrap long lines in their ebuilds to make it easier to
read, but this causes us issues when doing line-by-line checking. So
automatically unroll those lines before passing the full content down
to our checkers.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
v3
- use import codecs for escaping strings

pym/repoman/checks.py | 63 +++++++++++++++++++++++++++++++++++++++---------
1 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/pym/repoman/checks.py b/pym/repoman/checks.py
index c17a0bd..402169e 100644
--- a/pym/repoman/checks.py
+++ b/pym/repoman/checks.py
@@ -5,6 +5,7 @@
"""This module contains functions used in Repoman to ascertain the quality
and correctness of an ebuild."""

+import codecs
import re
import time
import repoman.errors as errors
@@ -757,8 +758,11 @@ _here_doc_re = re.compile(r'.*s<<[-]?(w+)$')
_ignore_comment_re = re.compile(r'^s*#')

def run_checks(contents, pkg):
+ unicode_escape_codec = codecs.lookup('unicode_escape')
+ unicode_escape = lambda x: unicode_escape_codec.decode(x)[0]
checks = _constant_checks
here_doc_delim = None
+ multiline = None

for lc in checks:
lc.new(pkg)
@@ -772,19 +776,54 @@ def run_checks(contents, pkg):
here_doc = _here_doc_re.match(line)
if here_doc is not None:
here_doc_delim = re.compile(r'^s*%s$' % here_doc.group(1))
+ if here_doc_delim is not None:
+ continue
+
+ # Unroll multiline escaped strings so that we can check things:
+ # inherit foo bar
+ # moo
+ # cow
+ # This will merge these lines like so:
+ # inherit foo bar moo cow
+ try:
+ # A normal line will end in the two bytes: <> <
>. So decoding
+ # that will result in python thinking the <
> is being escaped
+ # and eat the single <> which makes it hard for us to detect.
+ # Instead, strip the newline (which we know all lines have), and
+ # append a <0>. Then when python escapes it, if the line ended
+ # in a <>, we'll end up with a <> marker to key off of. This
+ # shouldn't be a problem with any valid ebuild ...
+ line_escaped = unicode_escape(line.rstrip('
') + '0')
+ except:
+ # Who knows what kind of crazy crap an ebuild will have
+ # in it -- don't allow it to kill us.
+ line_escaped = line
+ if multiline:
+ # Chop off the and
bytes from the previous line.
+ multiline = multiline[:-2] + line
+ if not line_escaped.endswith(''):
+ line = multiline
+ num = multinum
+ multiline = None
+ else:
+ continue
+ else:
+ if line_escaped.endswith(''):
+ multinum = num
+ multiline = line
+ continue

- if here_doc_delim is None:
- # We're not in a here-document.
- is_comment = _ignore_comment_re.match(line) is not None
- for lc in checks:
- if is_comment and lc.ignore_comment:
- continue
- if lc.check_eapi(pkg.metadata['EAPI']):
- ignore = lc.ignore_line
- if not ignore or not ignore.match(line):
- e = lc.check(num, line)
- if e:
- yield lc.repoman_check_name, e % (num + 1)
+ # Finally we have a full line to parse.
+ is_comment = _ignore_comment_re.match(line) is not None
+ for lc in checks:
+ if is_comment and lc.ignore_comment:
+ continue
+ if lc.check_eapi(pkg.metadata['EAPI']):
+ ignore = lc.ignore_line
+ if not ignore or not ignore.match(line):
+ e = lc.check(num, line)
+ if e:
+ yield lc.repoman_check_name, e % (num + 1)

for lc in checks:
i = lc.end()
--
1.7.8.6

Zac Medico 05-25-2012 08:47 AM

repoman: unroll escaped lines so we can check the entirety of it
 
On 05/24/2012 10:22 PM, Mike Frysinger wrote:
> Sometimes people wrap long lines in their ebuilds to make it easier to
> read, but this causes us issues when doing line-by-line checking. So
> automatically unroll those lines before passing the full content down
> to our checkers.
>
> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
> ---
> v3
> - use import codecs for escaping strings

It looks good to me, except for this one part, where we should let
SystemExit raise:

> + try:
> + # A normal line will end in the two bytes: <> <
>. So decoding
> + # that will result in python thinking the <
> is being escaped
> + # and eat the single <> which makes it hard for us to detect.
> + # Instead, strip the newline (which we know all lines have), and
> + # append a <0>. Then when python escapes it, if the line ended
> + # in a <>, we'll end up with a <> marker to key off of. This
> + # shouldn't be a problem with any valid ebuild ...
> + line_escaped = unicode_escape(line.rstrip('
') + '0')

+ except SystemExit:
+ raise

> + except:


--
Thanks,
Zac

Mike Frysinger 05-25-2012 04:18 PM

repoman: unroll escaped lines so we can check the entirety of it
 
On Friday 25 May 2012 04:47:24 Zac Medico wrote:
> On 05/24/2012 10:22 PM, Mike Frysinger wrote:
> > Sometimes people wrap long lines in their ebuilds to make it easier to
> > read, but this causes us issues when doing line-by-line checking. So
> > automatically unroll those lines before passing the full content down
> > to our checkers.
>
> It looks good to me, except for this one part, where we should let
> SystemExit raise:

OK, i pushed with that fix
-mike


All times are GMT. The time now is 12:44 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.