FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux Pacman Development

 
 
LinkBack Thread Tools
 
Old 08-25-2012, 06:43 PM
Dave Reisner
 
Default makepkg: Alternate implementation of VCS URLs in sources array.

On Sat, Aug 25, 2012 at 01:36:41PM -0400, Luke Shumaker wrote:
> A while ago I started working on a derivative of makepkg to support
> having 'git://...' type urls in the sources=() array. When preparing
> to file this patch, I did a `git rebase`, and noticed that Allan McRae
> began working on a similar feature. Our implementations are in many
> ways similar. Hopefully mine will be useful.

An interesting approach. As you've noticed, I think we're fairly
committed to the implementation that Allan has provided. If you have
specific concerns, maybe we can work to fix those.

> My implementation makes minimal changes to makepkg itself (only adding
> blob expansion to DLAGENTS, allowing for things like
> "git+*::""). Instead I added a `vcsget` tool which generates a tarball
> from the VCS repo, in a very similar manner to the way Allan's
> implementation does so within makepkg.

I'm not thrilled with the shell I saw in that patch -- there's extensive
use of external commands when simple bash features would have sufficed.

> It looks as if Allan's download_*() functions are more verbose than
> mine about what failed when there is an error. His svn and hg handlers
> are likely more robust--though my git is pretty solid. I also have
> a half-written handler for for bzr.
>
> An advantage of my design is that it does allow for integrity checks
> of VCS packages, rather than inserting 'SKIP' into the md5sums
> array. This is very important to the derivative distribution Parabola.
> (However, the 'SKIP' option is still valuable for URLs that track a
> branch)

I don't see this as an advantage so much as a duplication. The backing
VCS takes care of integrity checks. They're only necessary with tarballs
because there is no "builtin" checksum to reference.

> Happy hacking,
> ~ Luke Shumaker
>
> Luke Shumaker (3):
> Add a `vcsget` tool to download source from VCS repositories.
> makepkg: do glob expansion in DLAGENTS maps
> makepkg.conf: add vcsget DLAGENTS
>
> etc/makepkg.conf.in | 8 +-
> scripts/.gitignore | 1 +
> scripts/Makefile.am | 4 +-
> scripts/makepkg.sh.in | 13 ++-
> scripts/vcsget.sh.in | 294 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 316 insertions(+), 4 deletions(-)
> create mode 100644 scripts/vcsget.sh.in
>
> --
> 1.7.12
>
>
 
Old 08-27-2012, 04:12 AM
Allan McRae
 
Default makepkg: Alternate implementation of VCS URLs in sources array.

On 26/08/12 03:36, Luke Shumaker wrote:
> An advantage of my design is that it does allow for integrity checks
> of VCS packages, rather than inserting 'SKIP' into the md5sums
> array. This is very important to the derivative distribution Parabola.
> (However, the 'SKIP' option is still valuable for URLs that track a
> branch)

Can you explain why this is important? That would help me understand
what you are trying to achieve that can not be done with the current system.

The only reason I can see to create a tarball is to distribute the
source on its own. Using "makepkg --allsource" creates a full source
tarball including the VCS sources. If you are worried about integrity
of those VCS sources in the source tarball, adding a checksum to the
PKGBUILD does nothing as the PKGBUILD can be edited too. You are best
to use "makepkg --allsource" and PGP sign the resulting tarball.

But perhaps I entirely missed the issue...


A comment that I need to make is about the need for a separate tool to
download the vcs sources. We used to have a script called "versionpkg"
that dealt with VCS packages. That got merged into makepkg and my
recent work was to fully integrate VCS packaging into makepkg. So going
using a separate script to deal with VCS sources is really a step or two
backwards.

Allan
 
Old 08-27-2012, 02:53 PM
Dave Reisner
 
Default makepkg: Alternate implementation of VCS URLs in sources array.

On Mon, Aug 27, 2012 at 09:55:57AM -0400, Luke T.Shumaker wrote:
> At Sat, 25 Aug 2012 14:43:32 -0400,
> Dave Reisner wrote:
> >
> > On Sat, Aug 25, 2012 at 01:36:41PM -0400, Luke Shumaker wrote:
> > > A while ago I started working on a derivative of makepkg to support
> > > having 'git://...' type urls in the sources=() array. When preparing
> > > to file this patch, I did a `git rebase`, and noticed that Allan McRae
> > > began working on a similar feature. Our implementations are in many
> > > ways similar. Hopefully mine will be useful.
> >
> > An interesting approach. As you've noticed, I think we're fairly
> > committed to the implementation that Allan has provided. If you have
> > specific concerns, maybe we can work to fix those.
>
> * Allan's URL parsing has some issues, see point 2 below. This is an
> easy fix.

Great, let's fix those.

> * Composability. It's not really a "problem", but it seems to me that
> It's better to create a new general-purpose tool, instead of
> shoving a bunch of "special case" behavior into makepkg. Allan's
> implementation could even fairly easily be pulled out of makepkg.

-1 for moving it out of makepkg. Decoupling it means we would need to do
a lot of sharing so that the tool is sure to understand makepkg's
absurdly complicated environment. Not extensible, and there's a lot of
code that would be "duplicated".

> And really, does it make sense to have these URL schemes hardcoded
> into makepkg? Why have DLAGENTS at all, if we're going to hard-code
> schemes into makepkg? Again, I'm not against Allan's imlementation,
> but I am for moving it out of makepkg.

Let's see what's already established and available for http downloading:

- curl
- wget
- aria2
- dog
- fetch
- bash tcp sockets
(im sure there's more)

Show me a popular reimplementation of the reference
git/svn/bzr/cvs/darcs/hg client, and then we'll talk about a way to make
it possible for the user to configure that in place of our "hardcoded
default".

The nerd in me wants to point out that there is no backing RFC for any
of the VCS protocols so it doesn't belong in DLAGENTS, but we've
violated that puritanism already by adding a DLAGENT for scp.

> > > My implementation makes minimal changes to makepkg itself (only adding
> > > blob expansion to DLAGENTS, allowing for things like
> > > "git+*::""). Instead I added a `vcsget` tool which generates a tarball
> > > from the VCS repo, in a very similar manner to the way Allan's
> > > implementation does so within makepkg.
> >
> > I'm not thrilled with the shell I saw in that patch -- there's extensive
> > use of external commands when simple bash features would have sufficed.
>
> I assume you're speaking of my
> 1. use of cut in get_field
> 2. use of sed to parse URLs
> 3. use of sed to parse URL authorities
> 4. use of readlink to establish the tarball name
> 5. use of sed to create a basedir name from the tarball name
> 6. use of '[' with if statements

Probably a good laundry list to start with. I'm happy to give a proper
review of vcsget if you'd like, but I'll respond here in general terms.

> My defense of each:
>
> 1. `cut` in get_field
> ---------------------
>
> This was simply the simplest solution that was obviously
> correct. Another solution would have been to manipulate IFS, which has
> all kinds of funny side effects and fringe behaviors.
>
> That was a weak argument, and as a coder, I would be thrilled to find
> out if there is a better solution.

Cut is useless. IFS manipulation with read is the preferred solution
here. I'm curious what you think are the "fringe behaviors" in altering
IFS (I suspect you're merely misusing it -- it's a common point of
confusion). I've yet to come across any that make me want to use cut.
I'd sooner offload a larger portion of work to awk then use cut for
several reasons, anyways.

> 2. `sed` to parse URLs
> ----------------------
>
> I wanted full URL parsing, for a few cases when the translation is
> less than than straight-forward. For example, including authentication
> info in svn URLs.
>
> Further, I wanted to make sure that *any* URL parsing done would be
> robust. Given that the regex used was taken straight from the relevent
> RFC, I knew I could count on it to work, even in fringe cases. For
> example a "fragment" *should* be able to contain a '#', but Allan's
> implementation discards anything before the last '#' in the
> fragment. This is a problem for (at least) git, as tags and branches
> may contain the symbol.

bash has regex matching which supports ERE with backreferences -- this
could easily be adopted to split the URL all at once, simply saving the
relevant pieces of BASH_REMATCH after the fact. See the =~ operator in
bash(1).

> 3. `sed` to parse URL authorities
> ---------------------------------
>
> I believe that a robust (not having the same problems as fragments for
> URLs) implementation in bash would be non-trivial.

See above.

> 4. `readlink` to establish the tarball name
> -------------------------------------------
>
> I used `readlink -m` to turn the tarball name into an absolute
> path. This was not an absolute must, but it allowed me to avoid
> worrying about keeping track of the current directory. It would be
> fairly easy to remove this by useing pushd/popd instead of cd. If you
> would like this changed, I can do that.

This is _not_ portable. We've refrained from using readlink at all in
our codebase except in the untainted form. The only thing you can
portably rely on readlink to do is to wrap the readlink syscall. Better
yet, figure out why a potential symlink isn't good enough and fix that.
You shouldn't need to know the actual target that a symlink points to,
since 90% of a tool will dereference it for you.

> 5. `sed` to create a basedir name from the tarball name
> -------------------------------------------------------
>
> I'll admit, this was me getting a little lazy. A pure bash
> implementation is:
>
> base=${tarball##*/}
> base=${base%%.tar.*}
> base=${base%.tar}

And with an extglob, you can more precisely cut that down to:

base=${tarball##*/}
base=${base%.tar?(.*)}

The unfortunately named foo.tar.bar.tar.gz wouldn't be "broken" in this way.

> 6. `[` with if statements
> -------------------------
>
> Can I call stylistic choice on this one? It is trivial to replace this
> with bash's '[['.

Apologies for the incoming rant.

No, this isn't merely an issue of "style". [[ is far superior to [ in
several ways. It's a bash keyword, making it far faster than the [
builtin, it supports pattern matching with globs and regex, and it has
defined behavior when you pass more than 3 arguments. Quoting semantics
are far simpler (and safer), too. Consider a contrived example which,
actually, you see far too often:

[ -n $foo ]

This is always true, regardless of whether or not 'foo' is defined.
since '[' is a builtin, which means it behaves no different than another
other standard command. In reality, it's the same as:

[ -n ]

This is _true_, because you've really only passed a single argument to
[. In the single argument case, the existance test is performed. Since
'-n' always has a length, it's true.

By comparison, the preferred bash construction:

[[ -n $foo ]]

This always does the right thing because the lexer passes over it and
sees the variable before its expanded. It understands that even though
foo might not be defined, you've passed an argument and exits with an
error when the argument is the empty string. The only time you need to
quote arguments to tests is when dealing with equations:

[[ $foo = $bar ]]

is not the same as

[[ $foo = "$bar" ]]

Generally, you want the latter since the RHS (right hand side) is
subject to glob expansion. Quoting it ensures that any glob characters
are not expanded. The LHS is _never_ subject to glob expansion, but
will require quoting if it contains whitespace.

> >
> > > It looks as if Allan's download_*() functions are more verbose than
> > > mine about what failed when there is an error. His svn and hg handlers
> > > are likely more robust--though my git is pretty solid. I also have
> > > a half-written handler for for bzr.
> > >
> > > An advantage of my design is that it does allow for integrity checks
> > > of VCS packages, rather than inserting 'SKIP' into the md5sums
> > > array. This is very important to the derivative distribution Parabola.
> > > (However, the 'SKIP' option is still valuable for URLs that track a
> > > branch)
> >
> > I don't see this as an advantage so much as a duplication. The backing
> > VCS takes care of integrity checks. They're only necessary with tarballs
> > because there is no "builtin" checksum to reference.
>
> This is only partially true of the non-distributed VCSs.
>
> There was concern from other Parabola developers about being able to
> use the checksum to unambiguously be able to confirm that a source
> tree is the correct version, but Allan's implementation seems
> acceptable on that front.
>
> Happy hacking,
> ~ Luke Shumaker
>
 
Old 08-28-2012, 12:37 AM
Dave Reisner
 
Default makepkg: Alternate implementation of VCS URLs in sources array.

On Mon, Aug 27, 2012 at 08:17:22PM -0400, Luke T.Shumaker wrote:

Ermagerd, snerping...

> I'll give you that awk might be better--but I was never comfortable
> with awk. The problem with IFS that stopped me from using it is that
> it always* will collapse multiple whitespace characters, so it is
> impossible to have an empty column in the middle if useing a ' '
> delimiter. For some reason I was thinking that space was the only safe
> (printable ASCII) character to use--I now realize that both '<' and
> '>' would work fine.
>
> * perhaps there is a shopt option, but I am unaware of it

Yes, this is correct about collapsing empty columns. You can preserve
those empty columns with some hackery, but you cannot split on them. And
yes, this is one case where I would move to awk. It's a lovely little
language that I find to be extremely flexible, and oh gosh readable --
especially when matched up against the likes of sed.

> My use of `readlink` had nothing to do with symlinks; it is simply to
> turn a relative path into an absolute path. An 'unintended' use of
> readlink, but the UNIX authors would say that so is `cat` to display a
> single file is too.

And this is just a result of me not really reading vcsget too closely
when I crafted this reply. If you need an absolute path, just look at
the first character. If its a slash, you're done. If it's not, then
prepend $PWD/.

> I've seen this in several codebases, and is the most reliable method
> of doing this in shell, assuming GNU coreutils.

Yup, but we don't assume that.

> As for quoting rules--I always disliked the 'simpler' rules for '[[',
> the seem like they were trying to compensate for bad programmers, (in
> the first example) I'd have to quote the variable if I were to do a
> similar task with any other command.

Sure... so many bugs in shell boil down to quoting problems, so I'm
actually happy to have one respite here. I've written a _lot_ of shell
over the past 2-3 years.

> ----
>
> Well, I have been taught a few features of Bash, and been made to feel
> like I noob. Thank you, this is learning!

It's not clear that anyone ever _really_ knows shell. Over time, you
just make fewer grevious errors.
 
Old 08-28-2012, 03:17 AM
Allan McRae
 
Default makepkg: Alternate implementation of VCS URLs in sources array.

On 27/08/12 23:55, Luke T.Shumaker wrote:
> For
> example a "fragment" *should* be able to contain a '#', but Allan's
> implementation discards anything before the last '#' in the
> fragment. This is a problem for (at least) git, as tags and branches
> may contain the symbol.


Fix on its way.

Allan
 
Old 08-28-2012, 03:18 AM
Allan McRae
 
Default makepkg: Alternate implementation of VCS URLs in sources array.

On 28/08/12 10:17, Luke T.Shumaker wrote:
> Another effect of the Allan's current implementation is that adding
> another scheme requires modification to makepkg, event if it is just
> to pass it to DLAGENTS. I'd argue that even if we have the special
> cases, the correct behavior would be to have download_file (DLAGENTS)
> as a fallback, instead of only tripping on 'ftp|http|https|rsync|scp'.

Fix on its way.

> Further, a use case I see is a user putting some funky stuff in
> DLAGENTS to either use an offline cache, or do some network stuff to
> deal with a contrived network setup.

This is the only thing I think needs dealt with... maybe... Can you
give a concrete example?

Thanks,
Allan
 

Thread Tools




All times are GMT. The time now is 08:09 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org