Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   ArchLinux Pacman Development (http://www.linux-archive.org/archlinux-pacman-development/)
-   -   Add a `vcsget` tool to download source from VCS repositories. (http://www.linux-archive.org/archlinux-pacman-development/698278-add-vcsget-tool-download-source-vcs-repositories.html)

Dave Reisner 08-28-2012 01:35 AM

Add a `vcsget` tool to download source from VCS repositories.
 
On Sat, Aug 25, 2012 at 01:36:42PM -0400, Luke Shumaker wrote:
> It would support
> * cvs
> * git
> * svn
> * bzr
> * hg
>
> But cvs support is yet to be written, and bzr support is half-way written.
>
> Git is the only mode that I'm thoroughly confident in, the others need
> testing.
>
> Signed-off-by: Luke Shumaker <LukeShu@sbcglobal.net>
> ---

Code review, as promised. There's a general lack of error checking on
important commands. Things that mutate the filesystem or change
directory should _always_ be checked for error. I haven't pointed it all
out, but you'll get the idea.

I still don't like the idea of creating a tarball out of everything,
just to unpack it after its completed. From a distribution packager
perspective, building from version control shouldn't be a regular thing.
Yes, I do understand your decision based on Parabola, but I don't have
to agree with it =P.

> scripts/.gitignore | 1 +
> scripts/Makefile.am | 4 +-
> scripts/vcsget.sh.in | 294 ++++++++++++++++++++++++++++++++++++++++++++++++++ +
> 3 files changed, 298 insertions(+), 1 deletion(-)
> create mode 100644 scripts/vcsget.sh.in
>
> diff --git a/scripts/.gitignore b/scripts/.gitignore
> index 9e403bf..e59551e 100644
> --- a/scripts/.gitignore
> +++ b/scripts/.gitignore
> @@ -6,3 +6,4 @@ pkgdelta
> repo-add
> repo-elephant

__
'.
'-
/ /_ .---.
/ | ,./--.// )
| // )/ /
' ^ ^ / )____.----.. 6
'.____. .___/ ._)
./. )
' /
_/ / ). ) (
/# .! | / /
C// # /'-----'/ # /
. 'C/ | | | | |mrf ,
), .. .'OOO-'. ..'OOO'OOO-'. ..(,

> repo-remove
> +vcsget
> diff --git a/scripts/Makefile.am b/scripts/Makefile.am
> index 29c81aa..8e6d037 100644
> --- a/scripts/Makefile.am
> +++ b/scripts/Makefile.am
> @@ -14,7 +14,8 @@ OURSCRIPTS =
> pacman-key
> pacman-optimize
> pkgdelta
> - repo-add
> + repo-add
> + vcsget
>
> EXTRA_DIST =
> makepkg.sh.in
> @@ -23,6 +24,7 @@ EXTRA_DIST =
> pacman-optimize.sh.in
> pkgdelta.sh.in
> repo-add.sh.in
> + vcsget.sh.in
> $(LIBRARY)
>
> LIBRARY =
> diff --git a/scripts/vcsget.sh.in b/scripts/vcsget.sh.in
> new file mode 100644
> index 0000000..f8f6e4b
> --- /dev/null
> +++ b/scripts/vcsget.sh.in
> @@ -0,0 +1,294 @@
> +#!/bin/bash
> +#
> +# vcsget - downloader agent that generates tarballs from VCS repos
> +# @configure_input@
> +#
> +# Copyright (c) 2012 Luke Shumaker <lukeshu@sbcglobal.net>
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
> +#
> +
> +# gettext initialization
> +export TEXTDOMAIN='pacman-scripts'
> +export TEXTDOMAINDIR='@localedir@'
> +
> +unset CDPATH
> +
> +m4_include(library/output_format.sh)
> +
> +##
> +# usage : in_array( $needle, $haystack )
> +# return : 0 - found
> +# 1 - not found
> +##
> +in_array() {
> + local needle=$1; shift
> + local item
> + for item in "$@"; do
> + [[ $item = "$needle" ]] && return 0 # Found
> + done
> + return 1 # Not Found
> +}
> +
> +usage() {
> + echo "Usage: $0 <URL> <OUTPUT_TARBALL>"

You might consider stuffing $0 into a var sooner as "${0##*/}", similar to
program_invocation_short_name in C. Invoking your script as
../some/other/dir/vcsget won't horribly throw off your line wrapping and
it looks prettier in the trimmed version, damnit.

> + echo ""
> + echo "Talks with multiple version control systems (VCSs) to create a"
> + echo "tarball of a specific commit."
> + echo ""
> + echo "<OUTPUT_TARBALL> is a plain, uncompressed tarball. Given the same"
> + echo "commit, the tarball will always have the same checksum."
> + echo ""
> + echo "<URL> is the repository, and possibly commit to download and tar."
> + echo "The following schemes are recognized:"
> + echo " * cvs://"
> + echo " * git://"
> + echo " * git+PROTO://"
> + echo " * svn+PROTO://"
> + echo " * bzr://"
> + echo " * hg+PROTO://"
> + echo ""
> + echo "URLs to be consumed by $0 are not always in the format of the"
> + echo "relevent VCS program, but normalized to proper URLs. Simply, this"
> + echo "means:"
> + echo ""
> + echo " scheme://[authinfo@]host[:port]/path[#fragment]"
> + echo ""
> + echo "Where <authinfo> is in the format "username[:password]" and"
> + echo "<fragment> is the commit ID, branch, or tag to be returned."
> +}
> +
> +##
> +# Get the column "$2" from the space-delimited string "$1".
> +# This has the advantage over `set -- $1` and `arr=($1)` that it separates on
> +# a single space, instead of 's+', which allows it to have empty columns.
> +##
> +get_field() {
> + local str=$1
> + local i=$2
> + echo "$str"|cut -d' ' -f $i
> +}
> +
> +##
> +# Parse a URL into parts, according to RFC 3986, "URI Generic Syntax".
> +# Sets the variables `scheme`, `authority`, `path`, `query` and `fragment`.
> +##
> +parse_url() {
> + local url=$1
> +
> + # This regex is copied from RFC 3986
> + local regex='^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?'
> + local parts=$(echo "$url"|sed -r "s@$regex@2 4 5 7 9@")
> +
> + scheme=$( get_field "$parts" 1)
> + authority=$(get_field "$parts" 2)
> + path=$( get_field "$parts" 3)
> + query=$( get_field "$parts" 4)
> + fragment=$( get_field "$parts" 5)
> +}

As mentioned, this should all be replaced with bash regex to avoid the
constant forking to sed.

> +
> +##
> +# Parse an authority from a URL.
> +# Sets the variables `user`, `pass`, `sock`, `host` and `port`.
> +# `sock` is socket, which is defined as `hostport` in the RFC.
> +##
> +parse_authority() {
> + local authority=$1
> +
> + local regex='^(([^:]*)(:([^@]*))?@)?(([^:]+)(:([0-9]*))?)'
> + # 12 3 4 56 7 8
> + # 1 = userinfo@
> + # 2 = user
> + # 3 = :pass
> + # 4 = pass
> + # 5 = hostport
> + # 6 = host
> + # 7 = :port
> + # 8 = port
> + local parts=$(echo "$authority"|sed -r "s%$regex%2 4 5 6 8%")
> + user=$(get_field "$parts" 1)
> + pass=$(get_field "$parts" 2)
> + sock=$(get_field "$parts" 3)
> + host=$(get_field "$parts" 4)
> + port=$(get_field "$parts" 5)
> +}

More regex. In general, stuffing code into a sed expression is frowned
upon (for fear of separator clashes). In this case, you know the value
of $regex ahead of time, so it's okay.

> +
> +download_cvs() {
> + # TODO
> + error "$(gettext 'CVS not implemented')"
> + exit 1
> +}
> +
> +download_git() {
> + msg "$(gettext 'Getting source from git')"
> + : ${fragment:=master} # default to 'master' if not set
> + send_url="${scheme#git+}://${authority}${path}"
> +
> + # Where to checkout the code to
> + dir="$basedir/git/$send_url"

always localize your variables when possible.

> + if [[ "${dir##*.}" != git ]]; then

Random proper bash test with quoting on the wrong side ;)

> + dir="${dir}/.git"
> + fi
> +
> + # Download
> + if [ ! -d "$dir" ]; then
> + msg2 "$(gettext 'doing initial clone')"
> + mkdir -p "$dir"
> + git clone --bare "$send_url" "$dir"
> + fi
> +
> + cd "$dir"
> + refs=($(git show-ref|sed 's@.*/@@'|sort -u))

This is the wrong way to read a command into an array. It'll break on
legit whitespace in a ref, and its subject to glob expansion.

Addtionally, it breaks refs with slashes in them. Right now, using this
with Allan's working-split/binary-path branch will break. I'd use awk
and read for this, if you'd like to keep this avenue...

IFS=$'
' read -rd ' -a refs <
<(git show-ref | awk '{ sub(/refs/[^/]+/?/, "", $2); print $2 }')

I'll point out that Allan's implementation has the advantage here of
forcing that the user provide the _type_ of ref as the URI fragment. In
this case, we know what kind of ref that translates to and git-show-ref
can do the verification for us via the --verify -q flags.

> + if in_array "$fragment" "${refs[@]}"; then
> + # The ref is symbolic, so it may have changed upstream
> + msg2 "$(gettext 'fetching updates')"
> + git fetch --all

that --all flag seems like you're doing more work than you need to. A
repo like linux.git is an extreme case, but could add a fair bit of
weight to the call.

No error checking?

> + fi
> +
> + # Generate tarball
> + archive=(git archive --format=tar --prefix="$base/"
> + "$fragment" -o "$tarball")

localize this.

> + msg2 "$(gettext 'trying to generate tarball')"
> + if "${archive[@]}" 2>/dev/null; then
> + # success
> + msg2 "$(gettext 'success')"
> + exit 0

exit 0 here is redundant, unless you expect that msg2 will fail.

> + else
> + # the relevent commit is probably newer than we have
> + msg2 "$(gettext 'failure, forcing an update')"
> + git fetch --all
> + msg2 "$(gettext 'generating tarball')"
> + "${archive[@]}"

Don't hide the command being executed. You've picked an appropriately
descriptive name, but I think it'd better to file only the args into an
array. It's easier to tell what's being executed at a glance, and it
looks more "natural" in terms of command execution, e.g.

bsdtar "${archive_args[@]}"

In addition, this is the most important part of the function, but you're
not doing any error checking.

> + local s=$?
> + exit $?

I can only assume there's a typo somewhere in here. You're throwing away
your return value from the archive command and exiting (invariably) 0.

> + fi
> +}
> +
> +download_svn() {
> + msg "$(gettext 'Getting source from Subversion')"
> + parse_authority "$authority"
> + send_url="${scheme#svn+}://${sock}${path}"
> + args=('--config-dir' "$basedir/svn/.config")

localize this.

> + if [ -n "$user" ]; then
> + args+=('--username' "$user")

I just think it's weird seeing a combination of POSIX (single brace) and
non-POSIX (arrays) syntax right next to each other =P.

> + fi
> + if [ -n "$pass" ]; then
> + args+=('--password' "$pass")
> + fi
> + if [ -n "$fragment" ]; then
> + args+=('-r' "$fragment")
> + fi
> +
> + # Where to checkout the code to
> + dir="$basedir/svn/$send_url#$fragment"

And localize this... There's more below, I won't point them out.

> +
> + # Download
> + if [ ! -d "$dir" ]; then
> + mkdir -p "$dir"
> + svn co "${args[@]}" "$send_url" "$dir"
> + else
> + cd "$dir"
> + svn up "${args[@]}"
> + fi
> +
> + # Generate tarball
> + mkdir "$dir.tmp$$"

If you really need to make a temporary directory, I suggest using
mktemp, which can be made to honor the user's TMPDIR var. They'll thank
you. but... see below.

> + cp -r "$dir" "$dir.tmp$$/$base"
> + find "$dir.tmp$$/$base" -name .svn -exec rm -rf '{}' +
> + cd "$dir.tmp$$"
> + bsdtar cf "$tarball" "$base"

bsdtar's --exclude flag makes this all unnecessary.

> + rm -rf "$dir.tmp$$"
> +}
> +
> +download_bzr() {
> + # TODO finish bzr support
> + error "$(gettext 'Bazaar not implemented')"
> + exit 1
> +
> + send_url="${url%%#*}"
> + if [ "$scheme" != 'bzr+ssh' ]; then
> + send_url="${send_url#bzr+}"
> + fi
> +
> + dir="$basedir/bzr/$send_url"
> +
> + # Download
> + if [ ! -d "$dir" ]; then
> + msg2 "$(gettext 'doing initial checkout')"
> + mkdir -p "$dir"
> + bzr checkout "$send_url" "$dir"
> + fi
> +
> + cd "$dir"
> + bzr update
> +
> + # Generate tarball
> +}
> +
> +download_hg() {
> + msg "$(gettext 'Getting source from Mercurial')"
> + send_url="${url#hg+}"
> + hg clone -U "$send_url"
> + dir="$basedir/hg/${send_url%%#*}"
> +
> + # Download
> + if [ ! -d "$dir" ]; then
> + mkdir -p "$dir"
> + hg clone "$send_url" "$dir"
> + else
> + cd "$dir"
> + hg update -r "$fragment"
> + fi
> +
> + # Generate tarball
> + mkdir "$dir.tmp$$"
> + cp -r "$dir" "$dir.tmp$$/$base"
> + rm -rf "$dir.tmp$$/$base/.hg"
> + cd "$dir.tmp$$"
> + bsdtar cf "$tarball" "$base"
> + rm -rf "$dir.tmp$$"

Needs more error checking... and again with the --exclude.

> +}
> +
> +main() {
> + url=$1
> + tarball=$(readlink -m "$2")

As mentioned, readlink with flags is bad.

> +
> + base="$(echo "$tarball"|sed -r 's@.*/(.*).tar(..*)?$@1@')"

Augh... Who can read that crap? The PE version is so much nicer.

> + basedir="${tarball%/*}/vcs-cache"
> +
> + if ( in_array '-h' "$@" || in_array '--help' "$@" ); then

The subshell isn't at all useful here except for increasing your PID
count. Just get rid of it.

> + usage
> + exit 0
> + fi
> + if [ "$#" -ne 2 ]; then
> + usage
> + exit 1
> + fi
> +
> + msg "$(printf "$(gettext "Saving '%s' to '%s'")" "$url" "$tarball")"

I don't think you need that printf there since msg() expects a format
string.

> +
> + parse_url "$url"
> + case "$scheme" in
> + cvs) download_cvs;;
> + git+*|git) download_git;;
> + svn+*) download_svn;;
> + bzr) download_bzr;;
> + hg+*) download_hg;;
> + *)
> + error "$(gettext 'Unable to match a VCS with scheme'): $scheme"
> + exit 1;;
> + esac
> +}
> +
> +main "$@" >&2

This dumps _everything_ in the script to stderr, putting you on par with
gnupg. Don't be like that.

> --
> 1.7.12
>
>


All times are GMT. The time now is 06:32 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.