FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 05-20-2011, 03:39 PM
Jeroen Roovers
 
Default RFC: sed script redundancy

Hullo developers,


for a while now I've been wondering if all those sed scripts in all
those ebuilds are really effective.

To find out, I've tried a couple of angles on a sed hook that basically
dissects the sed command line provided, divides everything up into sed
scripts, files being processed and other options, and runs everything
through diff to get some meaningful QA output as to the effective use
of the sed scripts invoked.

Of course some of the time a sed script falsely seems to be ineffective,
but could be, when it uses some variable or output that varies depending
on the platform you run it on, like with the likes of $(get_libdir).

I've looked into sed's internal solutions to no avail, but something
like -i[SUFFIX] might help, since it gives you a backup file to compare
with the file that's being streamed.

The idea is to pass the result to
| diff -u $file $file[SUFFIX]
to figure out what was changed, and what sed script changed it.

Any help?


jer


PS: Because the outcome may depend on the platform you run the scripts
on, this probably shouldn't make it into a QA test in portage, but it
could still help developers evaluate how effective their ebuilds' and
eclasses' sed scripts are.
 
Old 05-20-2011, 03:56 PM
Fabian Groffen
 
Default RFC: sed script redundancy

On 20-05-2011 17:39:22 +0200, Jeroen Roovers wrote:
> I've looked into sed's internal solutions to no avail, but something
> like -i[SUFFIX] might help, since it gives you a backup file to compare
> with the file that's being streamed.
>
> The idea is to pass the result to
> | diff -u $file $file[SUFFIX]
> to figure out what was changed, and what sed script changed it.

I like your idea a lot.

I had to do a similar thing once in eapify, and back then I came up with
the following hack:

sed -e "<pattern>" "${file}" | diff "${file}" -

followed by the actual sed -i -e ...

This way I didn't need to write an intermediate file.


--
Fabian Groffen
Gentoo on a different level
 
Old 05-21-2011, 05:34 PM
Jeroen Roovers
 
Default RFC: sed script redundancy

On Fri, 20 May 2011 17:56:00 +0200
Fabian Groffen <grobian@gentoo.org> wrote:

> sed -e "<pattern>" "${file}" | diff "${file}" -
>
> followed by the actual sed -i -e ...
>
> This way I didn't need to write an intermediate file.

The problem there is that sed might be called just once on any one file,
but in the tree it is often invoked with multiple scripts, so this
simple implementation lacks a way to evaluate which sed scripts are
useful.

Also, how do I ensure the sed replacement works only on invocations
inside the ebuild, and not, say, in portage's internals?


jer
 
Old 05-22-2011, 10:50 AM
Fabian Groffen
 
Default RFC: sed script redundancy

On 21-05-2011 19:34:34 +0200, Jeroen Roovers wrote:
> On Fri, 20 May 2011 17:56:00 +0200
> Fabian Groffen <grobian@gentoo.org> wrote:
>
> > sed -e "<pattern>" "${file}" | diff "${file}" -
> >
> > followed by the actual sed -i -e ...
> >
> > This way I didn't need to write an intermediate file.
>
> The problem there is that sed might be called just once on any one file,
> but in the tree it is often invoked with multiple scripts, so this
> simple implementation lacks a way to evaluate which sed scripts are
> useful.
>
> Also, how do I ensure the sed replacement works only on invocations
> inside the ebuild, and not, say, in portage's internals?

(not tested, but as proof of concept)

alias sed my_sed
my_sed() {
local oargs="${@}"
local arg
local nargs=()
local hadi=
local hade=
while [[ -n $1 ]] ; do
case "$1" in
-i)
# ignore this flag
hadi=yes
;;
-e|-f)
shift
nargs+=( "-e$1" )
hade=yes
;;
-*)
nargs+=( "$1" )
hade=yes
;;
*)
if [[ -z ${hade} ]] ; then
nargs+=( "$1" )
elif [[ -z ${hadi} ]] ; then
# there is no inline replacing, not much we can do
break
else
sed "${nargs[@]}" "$1" | diff -q "$1" - > /dev/null
&& ewarn "sed ${oargs} has no effect on $1"
fi
;;
esac
shift
done

sed "${oargs}"
}


--
Fabian Groffen
Gentoo on a different level
 
Old 05-29-2011, 10:44 AM
Christopher Schwan
 
Default RFC: sed script redundancy

Thank you for that script. I experimented a bit with it and have a number of
corrections and suggestions:

- alias does not work because my_sed is not declared at this stage. I removed
the whole alias line because I want to selectively enable my_sed
- oargs must be an array in order to make quoting work:

local oargs=( "${@}" )

- In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
- is it correct to treat -e and -f alike ? I am not sure about that, because
the latter expects a file
- If no "-e" is given, the first non-option argument is treated as the sed-
script-expression, therefore I added hade=yes in the if-branch

The new function now reads:

my_sed() {
local oargs=( "$@" )
local arg
local nargs=()
local hadi=
local hade=

while [[ -n $1 ]] ; do
case "$1" in
-i|--in-place)
# ignore this flag
hadi=yes
;;
-e|--expression)
shift
nargs+=( "-e" "$1" )
hade=yes
;;
-f|--file)
shift
nargs+=( "-f" "$1" )
hade=yes
;;
-*)
nargs+=( "$1" )
;;
*)
if [[ -z ${hade} ]] ; then
nargs+=( "-e" "$1" )
hade=yes
elif [[ -z ${hadi} ]] ; then
# there is no inline replacing, not much we can do
break
else
sed "${nargs[@]}" "$1" | diff -q "$1" - > /dev/null &&
ewarn "sed ${nargs[@]} has no effect on $1"
fi
;;
esac
shift
done

sed "${oargs[@]}"
}

As you can see, I added support for long-options. However, testing the
individual sed commands remains to be done. This could be especially difficult
if input is taken from stdin (e.g. in cat foo | sed "s:a:b:g").

I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed
commands and I was able to spot one useless sed.



[1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
mathematics/sage/sage-4.7.ebuild


On Sunday 22 May 2011 12:50:43 Fabian Groffen wrote:
> On 21-05-2011 19:34:34 +0200, Jeroen Roovers wrote:
> > On Fri, 20 May 2011 17:56:00 +0200
> >
> > Fabian Groffen <grobian@gentoo.org> wrote:
> > > sed -e "<pattern>" "${file}" | diff "${file}" -
> > >
> > > followed by the actual sed -i -e ...
> > >
> > > This way I didn't need to write an intermediate file.
> >
> > The problem there is that sed might be called just once on any one file,
> > but in the tree it is often invoked with multiple scripts, so this
> > simple implementation lacks a way to evaluate which sed scripts are
> > useful.
> >
> > Also, how do I ensure the sed replacement works only on invocations
> > inside the ebuild, and not, say, in portage's internals?
>
> (not tested, but as proof of concept)
>
> alias sed my_sed
> my_sed() {
> local oargs="${@}"
> local arg
> local nargs=()
> local hadi=
> local hade=
> while [[ -n $1 ]] ; do
> case "$1" in
> -i)
> # ignore this flag
> hadi=yes
> ;;
> -e|-f)
> shift
> nargs+=( "-e$1" )
> hade=yes
> ;;
> -*)
> nargs+=( "$1" )
> hade=yes
> ;;
> *)
> if [[ -z ${hade} ]] ; then
> nargs+=( "$1" )
> elif [[ -z ${hadi} ]] ; then
> # there is no inline replacing, not much we can do
> break
> else
> sed "${nargs[@]}" "$1" | diff -q "$1" - > /dev/null
> && ewarn "sed ${oargs} has no effect on $1"
> fi
> ;;
> esac
> shift
> done
>
> sed "${oargs}"
> }
 
Old 05-29-2011, 11:00 AM
Fabian Groffen
 
Default RFC: sed script redundancy

On 29-05-2011 12:44:46 +0200, Christopher Schwan wrote:
> Thank you for that script. I experimented a bit with it and have a number of
> corrections and suggestions:
>
> - alias does not work because my_sed is not declared at this stage. I removed
> the whole alias line because I want to selectively enable my_sed
> - oargs must be an array in order to make quoting work:
>
> local oargs=( "${@}" )
>
> - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
> - is it correct to treat -e and -f alike ? I am not sure about that, because
> the latter expects a file

Yes, because (also in your function) you always shift, and assume the
next argument is there. Hence, you have two identical cases in your
script now. I only distinguised between 1) being able to do something
(-i) and 2) having a pattern to work with (-e/-f or first non-option
argument as string pattern).

> - If no "-e" is given, the first non-option argument is treated as the sed-
> script-expression, therefore I added hade=yes in the if-branch

That one was missing indeed. I just quickly wrote the proof of concept


> The new function now reads:
>
[snip improved function]
>
> As you can see, I added support for long-options. However, testing the
> individual sed commands remains to be done. This could be especially difficult
> if input is taken from stdin (e.g. in cat foo | sed "s:a:b:g").

You might be able to detect input is a pipe, and temporarily
write the input to some file, then perform the sed without the -i
requirement and remove the temp file after the real sed.

> I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed
> commands and I was able to spot one useless sed.

Cool, nice to see you've made it into something useful!

> [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
> mathematics/sage/sage-4.7.ebuild


--
Fabian Groffen
Gentoo on a different level
 
Old 05-29-2011, 11:49 AM
Christopher Schwan
 
Default RFC: sed script redundancy

On Sunday 29 May 2011 13:00:32 Fabian Groffen wrote:
> On 29-05-2011 12:44:46 +0200, Christopher Schwan wrote:
> > Thank you for that script. I experimented a bit with it and have a number
> > of corrections and suggestions:
> >
> > - alias does not work because my_sed is not declared at this stage. I
> > removed the whole alias line because I want to selectively enable my_sed
> >
> > - oargs must be an array in order to make quoting work:
> > local oargs=( "${@}" )
> >
> > - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
> > - is it correct to treat -e and -f alike ? I am not sure about that,
> > because the latter expects a file
>
> Yes, because (also in your function) you always shift, and assume the
> next argument is there. Hence, you have two identical cases in your
> script now. I only distinguised between 1) being able to do something
> (-i) and 2) having a pattern to work with (-e/-f or first non-option
> argument as string pattern).

Ok sorry - I did not express myself clearly. Your script is replacing "-f"
with "-e" and I wasnt sure if this is correct. Buf of course, they can be
treated alike:

-e|--expression|-f|--file)
arg="$1"
shift
nargs+=( "${arg}" "$1" )
hade=yes
;;

>
> > - If no "-e" is given, the first non-option argument is treated as the
> > sed- script-expression, therefore I added hade=yes in the if-branch
>
> That one was missing indeed. I just quickly wrote the proof of concept
>
>
> :
> > The new function now reads:
> [snip improved function]
>
> > As you can see, I added support for long-options. However, testing the
> > individual sed commands remains to be done. This could be especially
> > difficult if input is taken from stdin (e.g. in cat foo | sed
> > "s:a:b:g").
>
> You might be able to detect input is a pipe, and temporarily
> write the input to some file, then perform the sed without the -i
> requirement and remove the temp file after the real sed.

Good idea, thanks!

>
> > I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed
> > commands and I was able to spot one useless sed.
>
> Cool, nice to see you've made it into something useful!
>
> > [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
> > mathematics/sage/sage-4.7.ebuild
 
Old 10-28-2011, 01:08 AM
Ryan Hill
 
Default RFC: sed script redundancy

On Fri, 20 May 2011 17:39:22 +0200
Jeroen Roovers <jer@gentoo.org> wrote:

> for a while now I've been wondering if all those sed scripts in all
> those ebuilds are really effective.
>
> To find out, I've tried a couple of angles on a sed hook that basically
> dissects the sed command line provided, divides everything up into sed
> scripts, files being processed and other options, and runs everything
> through diff to get some meaningful QA output as to the effective use
> of the sed scripts invoked.
>
> Of course some of the time a sed script falsely seems to be ineffective,
> but could be, when it uses some variable or output that varies depending
> on the platform you run it on, like with the likes of $(get_libdir).
>
> I've looked into sed's internal solutions to no avail, but something
> like -i[SUFFIX] might help, since it gives you a backup file to compare
> with the file that's being streamed.
>
> The idea is to pass the result to
> | diff -u $file $file[SUFFIX]
> to figure out what was changed, and what sed script changed it.
>
> Any help?

Sorry, old thread. You can use the 'w' flag to write a log file of lines
changed. This includes lines changed where the replacement ended up the same
as the text matched (eg. lib->$(get_libdir)). Which means you can do
something like:

dirtyepic@tundra ~ $ cat test
foo
foobar
bar
foobarfoo
dirtyepic@tundra ~ $ sed -i -e 's:foo:foo:gw /dev/stdout' test | wc -l
3

I think only gnu sed can do the stdout thing.

--
fonts, gcc-porting, it makes no sense how it makes no sense
toolchain, wxwidgets but i'll take it free anytime
@ gentoo.org EFFD 380E 047A 4B51 D2BD C64F 8AA8 8346 F9A4 0662
 

Thread Tools




All times are GMT. The time now is 02:41 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org