Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Development (http://www.linux-archive.org/debian-development/)
-   -   Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook) (http://www.linux-archive.org/debian-development/699306-discussion-uscan-enhancement-1-uscan-enhancement-take-3-script-hook.html)

Jonas Smedegaard 08-30-2012 11:01 PM

Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook)
 
On 12-08-30 at 11:44pm, Andreas Tille wrote:
> I repost some extract from some private discussion about the
> Files-Excluded enhancement of uscan where Nicolas Boulenguez found
> some issues. (Nicolas, I hope you don't mind if I quote some of your
> non-private remarks in public.)
>
> Nicolas problem is mainly that if you specify
>
> "Files-Excluded: foo"
>
> and foo is a directory nothing is removed because the algorithm
> requires to specify rather
>
> "Files-Excluded: foo/"
>
> I wonder what you might think about this.

My proposal was and still is to keep things simple by reusing the syntax
for Files paragraphs, which has this distinction between files and
directories.

Yes, even if that is not "simple" in every way.


Regards,

- Jonas

--
* Jonas Smedegaard - idealist & Internet-arkitekt
* Tlf.: +45 40843136 Website: http://dr.jones.dk/

[x] quote me freely [ ] ask before reusing [ ] keep private

Nicolas Boulenguez 08-31-2012 01:38 AM

Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook)
 
(apologizes for the previous empty mail)

On Thu, Aug 30, 2012 at 11:44:34PM +0200, Andreas Tille wrote:
> On Thu, Aug 30, 2012 at 02:32:56AM +0200, Nicolas Boulenguez wrote:

> > Assume that "a" and "b" are directories, if I understand well, the
> > current behaviour is to recursively remove "a/b/", "a/b" and "a/" but
> > to ignore "a". I do not find that intuitive, and would suggest to
> > document it clearly.

> Could you suggest a wording which fits the requirement of "clearly
> documented".

Here is the meaning of Files-Excluded patterns, as I understand from
the perl code at
http://anonscm.debian.org/gitweb/?p=users/tille/devscripts.git;a=blob;f=scripts/uscan.pl;hb=bb686d032543d8ba8c5cd9c36ff8c2d9c33107 61#l1493

If $pattern contains no slash, then
(A) execute `find "$main_source_dir" -type f -name "$pattern" -delete`
That is, remove all *files* in all subdirectories whose *base name* match.
Else
(B) remove the trailing slash from $pattern, if present
(C) execute `find "$main_source_dir" -path "$main_source_dir/$pattern"
-print0 | xargs -0 rm -rf`
That is, remove all *files or subdirectories* whose *full path* match.

Jonas says that this imitates Files patterns, but
http://dep.debian.net/deps/dep5/#files-field mentions

Patterns match pathnames that start at the root of the source tree.
Thus, "Makefile.in" matches only the file at the root of the tree,
but "*/Makefile.in" matches at any depth.

I understand that matching depends only on the existence of the path,
not on it being a file or a directory. So
- "foo" should match "foo", even if a directory.
- "foo" should not match "bar/foo", even if a file.
- "foo/" should never match, even if "foo" is a directory.
In short, I would only expect (C) as the implementation.

> It is required that there is distinction between files and
> directories (see the discussion on debian-devel, for instance [1] and
> [2]).
> [1] http://lists.debian.org/debian-devel/2012/08/msg00512.html
> [2] http://lists.debian.org/debian-devel/2012/08/msg00449.html

In [2], Jonas says "I believe it is better to...", but does not
explain why.

> If I would feed 'a' to the removal algorithm it could simply
> happen, that a file c/a would be removed as well but it should not.

With (C), the "a" pattern will not remove the "c/a" file.

> In this case you can not specify something like
> *.jar
> for jar files hanging around in lib/*.jar or so. The code which tries
> to distinguish between files and pathes enables removing files in all
> directories.

With (C), the "*.jar" pattern will remove "lib/foo.jar" and "usr/lib/bar.jar".

----------------------------------------------------------------------

This suggestion is unrelated, but I repeat is because it was hidden in
the huge quoted mail. If we keep the current meaning, I suggest that
the implementation at
http://anonscm.debian.org/gitweb/?p=users/tille/devscripts.git;a=blob;f=scripts/uscan.pl;hb=bb686d032543d8ba8c5cd9c36ff8c2d9c33107 61#l1493
avoid splitting and grepping twice.

foreach (split /s+/, $data->{"files-excluded"}) {
if (grep {///}) {
# delete trailing '/' because otherwise find -path will fail
s?/+$?? ;
# use rm -rf to enable deleting non-empty directories
`find "$main_source_dir" -path "$main_source_dir/$_" -print0 | xargs -0 rm -rf`;
} else {
`find "$main_source_dir" -type f -name "$_" -delete`;
};
};


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120831013853.GA7999@pegase

Jonas Smedegaard 08-31-2012 09:38 AM

Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook)
 
On 12-08-31 at 03:38am, Nicolas Boulenguez wrote:
> (apologizes for the previous empty mail)
>
> On Thu, Aug 30, 2012 at 11:44:34PM +0200, Andreas Tille wrote:
> > On Thu, Aug 30, 2012 at 02:32:56AM +0200, Nicolas Boulenguez wrote:
>
> > > Assume that "a" and "b" are directories, if I understand well, the
> > > current behaviour is to recursively remove "a/b/", "a/b" and "a/"
> > > but to ignore "a". I do not find that intuitive, and would suggest
> > > to document it clearly.
>
> > Could you suggest a wording which fits the requirement of "clearly
> > documented".
>
> Here is the meaning of Files-Excluded patterns, as I understand from
> the perl code at
> http://anonscm.debian.org/gitweb/?p=users/tille/devscripts.git;a=blob;f=scripts/uscan.pl;hb=bb686d032543d8ba8c5cd9c36ff8c2d9c33107 61#l1493
>
> If $pattern contains no slash, then
> (A) execute `find "$main_source_dir" -type f -name "$pattern" -delete`
> That is, remove all *files* in all subdirectories whose *base name* match.
> Else
> (B) remove the trailing slash from $pattern, if present
> (C) execute `find "$main_source_dir" -path "$main_source_dir/$pattern"
> -print0 | xargs -0 rm -rf`
> That is, remove all *files or subdirectories* whose *full path* match.
>
> Jonas says that this imitates Files patterns, but
> http://dep.debian.net/deps/dep5/#files-field mentions
>
> Patterns match pathnames that start at the root of the source tree.
> Thus, "Makefile.in" matches only the file at the root of the tree,
> but "*/Makefile.in" matches at any depth.
>
> I understand that matching depends only on the existence of the path,
> not on it being a file or a directory. So
> - "foo" should match "foo", even if a directory.
> - "foo" should not match "bar/foo", even if a file.
> - "foo/" should never match, even if "foo" is a directory.
> In short, I would only expect (C) as the implementation.
>
> > It is required that there is distinction between files and
> > directories (see the discussion on debian-devel, for instance [1] and
> > [2]).
> > [1] http://lists.debian.org/debian-devel/2012/08/msg00512.html
> > [2] http://lists.debian.org/debian-devel/2012/08/msg00449.html
>
> In [2], Jonas says "I believe it is better to...", but does not
> explain why.
>
> > If I would feed 'a' to the removal algorithm it could simply
> > happen, that a file c/a would be removed as well but it should not.
>
> With (C), the "a" pattern will not remove the "c/a" file.

Hmm. Seems my head was stock in a too old draft of DEP-5 - or
completely off track. Sorry!

I'll step back and let others figure out wat is proper interpretation.


- Jonas

--
* Jonas Smedegaard - idealist & Internet-arkitekt
* Tlf.: +45 40843136 Website: http://dr.jones.dk/

[x] quote me freely [ ] ask before reusing [ ] keep private

Andreas Tille 09-04-2012 05:14 PM

Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook)
 
On Fri, Aug 31, 2012 at 11:38:15AM +0200, Jonas Smedegaard wrote:
>
> Hmm. Seems my head was stock in a too old draft of DEP-5 - or
> completely off track. Sorry!
>
> I'll step back and let others figure out wat is proper interpretation.

You might like to check my last commit to

git://git.debian.org/git/users/tille/devscripts.git

and check whether it matches your expectations when injecting
Files-Excluded field into debian/copyright.

You might also like to test "--repack-compression {gz,bz2,xz,lzma}".

Kind regards

Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120904171421.GB23209@an3as.eu">http://lists.debian.org/20120904171421.GB23209@an3as.eu

Nicolas Boulenguez 09-04-2012 11:17 PM

Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook)
 
On Tue, Sep 04, 2012 at 07:14:21PM +0200, Andreas Tille wrote:

> You might like to check my last commit to
> git://git.debian.org/git/users/tille/devscripts.git

(currently 4b3a4a6310ff1ff80ac1498cf92a99817c75ffce)

> and check whether it matches your expectations when injecting
> Files-Excluded field into debian/copyright.

At first view, your changes correct
- "foo" should match "foo", even if a directory.
but do not correct
- "foo" should not match "bar/foo", even if a file.
- "foo/" should never match, even if "foo" is a directory.

DEP5 does not distinguish files from directories, or whether paths
contain a slash. Why should you do these tests in your code? I suggest
the following patch.

diff --git a/scripts/uscan.pl b/scripts/uscan.pl
index 649f822..34e31a9 100755
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -1494,17 +1494,9 @@ EOF
print STDERR "Error: $main_source_dir is no directory";
}
my $nfiles_before = `find $main_source_dir | wc -l`;
- foreach (grep {///} split /s+/, $data->{"files-excluded"}) {
- # delete trailing '/' because otherwise find -path will fail
- s?/+$?? ;
- # use rm -rf to enable deleting non-empty directories
+ foreach (split /s+/, $data->{"files-excluded"}) {
`find "$main_source_dir" -path "$main_source_dir/$_" -print0 | xargs -0 rm -rf`;
};
- foreach (grep {/^[^/]+$/} split /s+/, $data->{"files-excluded"}) {
- `find "$main_source_dir" -type f -name "$_" -delete`;
- # the statement above does not delete directories - just do it now
- `rm -rf "$main_source_dir/$_" ` if ( -d "$main_source_dir/$_" ) ;
- };
my $nfiles_after = `find $main_source_dir | wc -l`;
if ( $nfiles_before == $nfiles_after && ! $exclude__MACOSX ) {
print "-- Source tree remains identical - no need for repacking.
" if $verbose;


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120904231747.GA3868@pegase

Charles Plessy 09-05-2012 12:04 AM

Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook)
 
Le Wed, Sep 05, 2012 at 01:17:47AM +0200, Nicolas Boulenguez a écrit :
>
> - "foo" should match "foo", even if a directory.
> but do not correct
> - "foo" should not match "bar/foo", even if a file.
> - "foo/" should never match, even if "foo" is a directory.
>
> DEP5 does not distinguish files from directories, or whether paths
> contain a slash. Why should you do these tests in your code? I suggest
> the following patch.

Hello everybody,

the machine-readable format does not mention trailing slashes at the end
of directory names, and refers to the -path test of the GNU find command,
which will fail with trailing slashes.

Also, out of the 1,129 machine-readable files version 1.0 present in
the packages-metadata archive (http://wiki.debian.org/UpstreamMetadata),
only four have a trailing slash after a directory name:

$ for file in $(grep http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ */*.copyright -H | cut -f 1 -d ':' ) ; do grep-dctrl --eregex -FFiles -sFiles '/$' < $file 2> /dev/null && echo $file; done
Files: modules/nginx-dav-ext-module/
n/nginx.copyright
Files: signpost-core/src/main/java/com/
o/oauth-signpost.copyright
Files: game/LuaUI/
s/spring.copyright
Files: lib/yard/rubygems/backports/
y/yard.copyright

(No Files field matched '/ ')

Andreas, I think that you can consider such entries a bug and output a warning
when finding them, or rejecting the whole file as invalid.

If this is unclear in the current specification, it can be clarified in the
next revision.

Have a nice day,

--
Charles Plessy
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120905000419.GA1967@falafel.plessy.net">http://lists.debian.org/20120905000419.GA1967@falafel.plessy.net

Jon Dowland 09-05-2012 10:52 AM

Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook)
 
On Wed, Sep 05, 2012 at 09:04:19AM +0900, Charles Plessy wrote:
> the machine-readable format does not mention trailing slashes at the end
> of directory names, and refers to the -path test of the GNU find command,

Good. Having a trailing-slash be meaningful is very confusing. I especially
hate this with rsync, where I now use trailing-slash on dirs exclusively,
since I've memorized the behaviour when you do this and not when you don't.
It frustrates me because foo and foo/ might differ but they are both names
for the same thing - and the behaviour should not differ if the name for
the same thing differs IMHO.

> which will fail with trailing slashes.

I can understand why it would fail, since the argument to path is a pattern
rather than a filename, and it is compared against find's list of paths
which just so happen not to have trailing slashes. Having said that I wonder
if this behaviour could be considered buggy.


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120905105227.GA31962@debian

Andreas Tille 09-06-2012 08:34 PM

Discussion of uscan enhancement 1 (Was: uscan enhancement take 3: script hook)
 
Hi Nicolas,

On Wed, Sep 05, 2012 at 01:17:47AM +0200, Nicolas Boulenguez wrote:
>
> diff --git a/scripts/uscan.pl b/scripts/uscan.pl
> index 649f822..34e31a9 100755
> --- a/scripts/uscan.pl
> +++ b/scripts/uscan.pl
> @@ -1494,17 +1494,9 @@ EOF
> print STDERR "Error: $main_source_dir is no directory";
> }
> my $nfiles_before = `find $main_source_dir | wc -l`;
> - foreach (grep {///} split /s+/, $data->{"files-excluded"}) {
> - # delete trailing '/' because otherwise find -path will fail
> - s?/+$?? ;
> - # use rm -rf to enable deleting non-empty directories
> + foreach (split /s+/, $data->{"files-excluded"}) {
> `find "$main_source_dir" -path "$main_source_dir/$_" -print0 | xargs -0 rm -rf`;
> };
> - foreach (grep {/^[^/]+$/} split /s+/, $data->{"files-excluded"}) {
> - `find "$main_source_dir" -type f -name "$_" -delete`;
> - # the statement above does not delete directories - just do it now
> - `rm -rf "$main_source_dir/$_" ` if ( -d "$main_source_dir/$_" ) ;
> - };
> my $nfiles_after = `find $main_source_dir | wc -l`;
> if ( $nfiles_before == $nfiles_after && ! $exclude__MACOSX ) {
> print "-- Source tree remains identical - no need for repacking.
" if $verbose;

I tested this patch and I have a problem with

svn://svn.debian.org/svn/debian-med/trunk/packages/rdp-classifier/trunk/debian

It specifies:

Files-Excluded:
__MACOSX
[a-z]*.jar

with the purpose to save ReadSeq.jar inside the source package. This works with the
old method:

$ find . -name "[a-z]*.jar"
./rdp_classifier_2.5/lib/junit.jar
./rdp_classifier_2.5/lib/commons-cli.jar
./rdp_classifier_2.5/rdp_classifier-2.5.jar

but failes when trying your patch:

$ find . -path "./[a-z]*.jar"
./rdp_classifier_2.5/lib/junit.jar
./rdp_classifier_2.5/lib/commons-cli.jar
./rdp_classifier_2.5/lib/ReadSeq.jar
./rdp_classifier_2.5/rdp_classifier-2.5.jar

I admit I did also not followed the DEP5 discussion very closely but the
current code could deal nicely with the specific removal which is not
the case with your proposal and I have no clue how to reasonably specify
the fact that all *.jars except one should be removed (besides
specifying every single file).

Kind regards

Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120906203438.GC13940@an3as.eu">http://lists.debian.org/20120906203438.GC13940@an3as.eu


All times are GMT. The time now is 02:40 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.