FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux Pacman Development

 
 
LinkBack Thread Tools
 
Old 11-14-2008, 11:39 AM
"Henning Garus"
 
Default delta support in libalpm

2008/11/14 Nagy Gabor <ngaba@bibl.u-szeged.hu>:
> Idézet Xavier <shiningxc@gmail.com>:
>
>> On Fri, Nov 14, 2008 at 1:02 PM, Nagy Gabor <ngaba@bibl.u-szeged.hu>
>> wrote:
>>>>
>>>> I guess I spoke to soon. I was right concerning xdelta3 using gzip for
>>>> handling gzipped files, however it doesn't use the -n flag. This gives
>>>> us the same behaviour as xdelta1, with one minor difference: Method 1.
>>>> stops working.
>>>
>>> Wait. I don't understand something. If it doesn't use the -n flag, how
>>> can
>>> we produce an md5sum-identical patched file? (The mtime is
>>> unpredictable.)
>>> This poses an extra problem, or not? I just did an effective test on
>>> xdelta3-diffing .tar.gz files and I saw that the patched md5sum indeed
>>> differ from the original one :-( Maybe we should search for a gzip header
>>> manipulation tool...
>>>
>>
>> One of the proposal was to use gzip -n, which should fix this problem.
>
> xdelta1 used(?) "gzip -n" for _patched_ .tar.gz file, xdelta3 doesn't use
> "-n", so setting "-n" for the original .tar.gz file won't help any more.
> Probably xdelta3 also can be configured or patched to use "-n"... Or I may
> completely misunderstood something...
>

xdelta1 uses zlib, my proposal was taking the uncompressed output of
xdelta1 and using gzip -n to compress it. xdelta3 uses gzip, but
without -n. I can't find any options for external compression, but
the easiest thing to do would be deactivating external compression in
xdelta3 and apply it manually by piping the output through gzip -n.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-10-2009, 01:41 PM
Xavier
 
Default delta support in libalpm

On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus
<henning.garus@googlemail.com> wrote:
> Hi,
>
> I have been looking through the current delta implementation in
> libalpm and have put some thought into changing makepkg/repo-add to
> support delta creation. However, I'm running into some problems,
> mostly due to md5sums and gzip.
>
> The current implementation works as follows. On a sync operation it is
> checked, whether a valid delta path exists and if the summed filesize
> of the deltas is smaller than the filesize of the whole download. When
> this is the case the deltas are downloaded and applied to the old
> file. After that the patched file is treated as if it was downloaded
> normally, this includes a check of the md5sum. Gzip files have a
> header, that has a timestamp, which will screw with this md5sum. When
> a patch is applied to a gzipped file by xdelta, xdelta will unzip the
> file, apply the patch and then rezip the file. The author of xdelta
> was obviously aware of the problems with the timestamp, because he
> decided to leave it empty. The same can be achieved by the -n option
> of gzip. But there comes the next problem, xdelta uses zlib for
> compression, gzip implements compression itself. And files created by
> gzip can differ from files created by zlib. Bsdtar uses zlib as well,
> but writes the timestamp and there is no option to prevent this (at
> least none that I can see).
>
> There are four ways around this, that I can think of:
>
> 1. create the package, then create the delta, apply the delta to the
> old version, remove the original new package and present the patched
> package as output
>
> I think this sucks, this ties delta creation to makepkg (more about
> that later) and has an incredibly huge and useless overhead (countless
> unzips and rezips and applying the patch).
>
> 2. create the package, but don't compress it with bsdtar, use gzip -n
> instead. This means we have to use gzip again, in libalpm, when we
> apply the delta.
>
> Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
> sure if this is a good thing, especially for libalpm.
>
> 3. save the md5sums of the unzipped tars in the synchdb and change
> libalpm to check those
>
> Seems reasonable, but I don't see a way to do this with libarchive, so
> this would require using zlib directly and pacman would lose the
> ability to handle to handle tar.bz2
>
> 4. Skip checking the md5sum for deltas
>
> OK during the initial synch, as long as we trust xdelta to do its job
> (the md5sums of both the old and the new file are in the delta file).
> But the created package will have the wrong md5sum and can't be used
> to reinstall, etc. which makes this look like a bad idea.
>
>
> In a previous mail Xavier toyed with the idea to put delta creation
> into repo-add, I have given this some thought, as it seems nice in
> principle, but there are drawbacks. For Arch this would mean creating
> deltas on Gerolde, which seems to be fairly strained already,
> according to the dev list. Furthermore this introduces some new
> variables to repo-add (at least repo location and an output location)
> this would be manageable, but doesn't look very nice.
>
> Delta creation in makepkg seems somehow ok (its already in there after
> all). But what I would really like is a separate tool for delta
> creation, which would allow the separation of building packages and
> creating deltas and setting up a separated delta server. This leaves
> us with options 2 and 3 and I am not really sure, which way to go.
>
>
> looking forward to your comments

A very small bump on this

1) gzip -n usage

But first, in the last discussion we had which started with the above
mail, it seems we were more in favor of option 2) :
> 2. create the package, but don't compress it with bsdtar, use gzip -n
> instead. This means we have to use gzip again, in libalpm, when we
> apply the delta.

In fact, Nathan already made a patch for that. I think this patch looks fine :
http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6427986

2) repo-add vs makepkg support

Nathan even made one to add support to repo-add too, but this patch
looked a bit more scary :
http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6427987
It was more complex than I hoped. But the simpler way I was thinking
about was to get delta support only in repo-add, instead of both
makepkg and repo-add :
http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6601225
Dan seemed to think it was better in repo-add, and Henning seems to
think it is better in makepkg. We need more discussion on this and
finally take a decision

2.1) About Nathan's patch to support both
If we do want to have the functionality in both makepkg and repo-add,
it would be cool to try to cleanup the code a bit, for example this :
+# create_xdelta_file - will create a delta for the package filename given.
+#
+# params:
+# $1 - the filename of the package
+# $2 - the arch of the package
+# $3 - the version and release of the package
+# $4 - the directory where the package is located
+# $5 - the extension of packages
+# $6 - 0 if an existing delta file should not be overwritten
+# $7 - the filename of the previous package (blank if not known)
+# $8 - the version of the previous package (blank if not known)

That's a lot of params

3) format of delta in the database

However I don't think there is any repo-add / makepkg patch to support
the new format. Henning also made a comment about the format :
http://bugs.archlinux.org/task/12000#comment34162
"So basically the current delta implementation is working. Only the
support in makepkg/repo-add is wrong. I am not exactly sure though,
why libalpm expects the md5sums of the old and the new package. I am
not sure if these are even used anywhere. I would feel save enough
with xdelta checking those and then libalpm checking the md5sum of the
final patched package."

I guess Dan added these two md5sums for safety but yes, they might not
be needed, I would also be fine with dropping them, even if they don't
hurt.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-18-2009, 10:33 PM
Xavier
 
Default delta support in libalpm

On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus
<henning.garus@googlemail.com> wrote:

>
> 2. create the package, but don't compress it with bsdtar, use gzip -n
> instead. This means we have to use gzip again, in libalpm, when we
> apply the delta.
>
> Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
> sure if this is a good thing, especially for libalpm.
>

That sounds alright, I just noticed that xdelta3 has an option to
disable the external recompression : -R
So we don't even have extra decompression/recompression steps, there is no loss.

+ snprintf(command, PATH_MAX, "xdelta3 -d -R -c
-s %s %s | gzip -n > %s", from, delta, to);


>
> In a previous mail Xavier toyed with the idea to put delta creation
> into repo-add, I have given this some thought, as it seems nice in
> principle, but there are drawbacks. For Arch this would mean creating
> deltas on Gerolde, which seems to be fairly strained already,
> according to the dev list. Furthermore this introduces some new
> variables to repo-add (at least repo location and an output location)
> this would be manageable, but doesn't look very nice.
>
> Delta creation in makepkg seems somehow ok (its already in there after
> all). But what I would really like is a separate tool for delta
> creation, which would allow the separation of building packages and
> creating deltas and setting up a separated delta server. This leaves
> us with options 2 and 3 and I am not really sure, which way to go.
>
>
> looking forward to your comments

I just went further than I ever did on this task, I seriously
considered a separate tool and spent some times thinking about all the
possibilities. I am still not sure what is best.
My thought was that it was very easy to generate the database info
during the delta creation. However, how do we keep this info?
Originally it was stored in the delta filename. But this was before
the database change. Now we need two filenames and two md5sums (old
pkg and new pkg), it does not seem realistic to store all this in the
delta filename.
Here are the options I considered :

1) delta support only in repo-add
No problem of a temporary storage of the info here, it goes directly
into the database. But maybe not flexible enough.

2) embed the .delta files into another format, eg delta.tar.gz archive
= delta file + DELTA metafile
Might be overkill? And we lost the ability of using xdelta directly

3) a separate tool creates the delta, generates the delta metainfo and
stores it in a file
This file can then be given to repo-add which basically just add its
contents to $pkgname-*/deltas

I gave a try to that third option. It's clearly not finished yet but I
am attaching the script in its current state to give an idea and to
know if I should move forward.
Example of usage:

$ create-xdelta libxml2-2.7.2-1-x86_64.pkg.tar.gz
libxml2-2.7.3-1-x86_64.pkg.tar.gz
$ create-xdelta libxml2-2.7.3-1-x86_64.pkg.tar.gz
libxml2-2.7.3-1.1-x86_64.pkg.tar.gz
(these two commands added one line each in a libxml2.pacdelta file)
$ repo-add db.tar.gz libxml2-2.7.3-1.1-x86_64.pkg.tar.gz
$ repo-add db.tar.gz libxml2.pacdelta
(pkg.tar.gz and pacdelta can be added together, but the order is
important, we need a package entry for libxml2 before adding deltas)

Now we can upgrade from 2.7.2-1 or 2.7.3-1 to 2.7.3-1.1 using deltas.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-19-2009, 08:06 AM
Brendan Hide
 
Default delta support in libalpm

Hi guys

I'm new here so I'm asking in advance that you forgive my ignorance.
Even before having clicked send, I feel like I'm spamming... o.O


Xavier wrote:

On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus
<henning.garus@googlemail.com> wrote:


2. create the package, but don't compress it with bsdtar, use gzip -n
instead. This means we have to use gzip again, in libalpm, when we
apply the delta

That sounds alright, I just noticed that xdelta3 has an option to
disable the external recompression : -R
So we don't even have extra decompression/recompression steps, there is no loss.

+ snprintf(command, PATH_MAX, "xdelta3 -d -R -c
-s %s %s | gzip -n > %s", from, delta, to);


The way I understand xdelta3's -R and -D options:
-D disable external decompression (encode/decode)
When applying a delta, same behaviour as -R
When creating a delta, even when given 2 compressed files, do not
discern if the file is compressed, ie, given 2 .tar.gz files, pretend
they're .bin files


-R disable external recompression (decode)
When applying a delta, given a compressed file, decompress *if* the
delta's metadata indicates the file was decompressed in the encode
process, apply the delta and, if decompression occurred whilst applying
the delta, do not bother to recompress. ie, when given a .tar.gz and a
.xd3, create a .tar


Unless my understanding above is completely wrong, using -R is going to
help but not without -D in the encoding process.


Also, since we're doing md5s of the .tar.gz instead of the .tar, we'd
also need to change some of the housekeeping - perhaps doing md5s of the
.tar as well as (or instead of) the .tar.gz.


There was also a bit of recent discussion on the Arch forum about this.
Some statistics indicate that vanilla -D isn't really worth it.
http://bbs.archlinux.org/viewtopic.php?pid=496539#p496539 shows a 10%
bandwidth savings with -D versus 85% bw savings without. I mentioned a
kluge workaround there, gzip --rsyncable, giving a 77% bw saving. The
kluge probably isn't the right way to go anyway.


So, um... how does this change the way forward? Or is my understanding
of the -R parameter completely wrong?


--
__________
Brendan Hide

_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-19-2009, 09:06 AM
Brendan Hide
 
Default delta support in libalpm

Xavier Wrote:
So we don't even have extra decompression/recompression steps, there
is no loss.


+ snprintf(command, PATH_MAX, "xdelta3 -d -R -c
-s %s %s | gzip -n > %s", from, delta, to);
Sorry. I see the logic now only. :/ The md5s are originally generated
after using "gzip -n" to do the compression and so Xavier is
specifically using "gzip -n" to prevent generating a "different" md5sum
after applying the delta.


--
__________
Brendan Hide

_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-19-2009, 12:20 PM
Xavier
 
Default delta support in libalpm

On Thu, Feb 19, 2009 at 11:06 AM, Brendan Hide
<brendan@swiftspirit.co.za> wrote:
> Xavier Wrote:
>>
>> So we don't even have extra decompression/recompression steps, there is no
>> loss.
>>
>> + snprintf(command, PATH_MAX, "xdelta3 -d -R -c
>> -s %s %s | gzip -n > %s", from, delta, to);
>
> Sorry. I see the logic now only. :/ The md5s are originally generated after
> using "gzip -n" to do the compression and so Xavier is specifically using
> "gzip -n" to prevent generating a "different" md5sum after applying the
> delta.
>

Ah, no problem, to be honest I wasn't sure how to answer your previous
mail : every statement you made seemed correct, only that you drew a
wrong conclusion from them

So yup, this will only work with packages generated from a patched
makepkg, that uses gzip -n as compression. Or well, packages could
also be decompressed and re-compressed with gzip -n.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-19-2009, 06:38 PM
Xavier
 
Default delta support in libalpm

On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus
<henning.garus@googlemail.com> wrote:
>
> Delta creation in makepkg seems somehow ok (its already in there after
> all). But what I would really like is a separate tool for delta
> creation, which would allow the separation of building packages and
> creating deltas and setting up a separated delta server. This leaves
> us with options 2 and 3 and I am not really sure, which way to go.
>
>
> looking forward to your comments

Sorry for answering this mail for the 10th time but I have always
different points to discuss.
Now I am very curious about "a separate delta server". What does this
mean exactly?
A different delta database (delta.tar.gz) + corresponding tools to
deal with it (delta-add / delta-remove)?

I am not so happy about adding support in repo-add for several
reasons, I always run into many problems and issues. For example,
deltas are not tied to one particular package version, rather to one
package name. And when we remove or upgrade a package entry, the data
index get lost.

So I liked the idea of a separate delta database. The problem is that
it might lead to a lot of code duplications in pacman if we need to
handle pmdeltadb_t besides pmdb_t. So I am not so happy about that
either.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-20-2009, 06:04 AM
Allan McRae
 
Default delta support in libalpm

Xavier wrote:

On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus
<henning.garus@googlemail.com> wrote:


Delta creation in makepkg seems somehow ok (its already in there after
all). But what I would really like is a separate tool for delta
creation, which would allow the separation of building packages and
creating deltas and setting up a separated delta server. This leaves
us with options 2 and 3 and I am not really sure, which way to go.


looking forward to your comments



Sorry for answering this mail for the 10th time but I have always
different points to discuss.
Now I am very curious about "a separate delta server". What does this
mean exactly?
A different delta database (delta.tar.gz) + corresponding tools to
deal with it (delta-add / delta-remove)?

I am not so happy about adding support in repo-add for several
reasons, I always run into many problems and issues. For example,
deltas are not tied to one particular package version, rather to one
package name. And when we remove or upgrade a package entry, the data
index get lost.

So I liked the idea of a separate delta database. The problem is that
it might lead to a lot of code duplications in pacman if we need to
handle pmdeltadb_t besides pmdb_t. So I am not so happy about that
either


Would it be useful if I put xdelta3 into the repos to help testing
things out for this?


Allan



_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-20-2009, 11:45 AM
Xavier
 
Default delta support in libalpm

On Fri, Feb 20, 2009 at 8:04 AM, Allan McRae <allan@archlinux.org> wrote:
>
> Would it be useful if I put xdelta3 into the repos to help testing things
> out for this?
>

Short answer : yes, I believe it would be nice to have xdelta3 in the repos
Long answer :
Well I did have something running, but I realized I have way too many
problems and questions to go in a testing phase.
I have no idea where this is going and where it should go.
There has never been any real official interests for delta. This seems
to make a requirement the ability to make a separate delta server.
This seems to require a separate delta database. This implies a new
level of complexity and code bloat in pacman. Now maybe it is worth
it, I don't know, it still makes me wondering why we put all this
delta stuff in pacman to begin with. What was the problem with
XferCommand, it seemed like it was a great idea. Now that
wget-xdelta.sh script is just a toy, but a much more powerful python
script could be written that has basically the same logic as pacman
currently has + the ability to fetch and parse a separate delta
database.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 
Old 02-23-2009, 07:57 AM
Brendan Hide
 
Default delta support in libalpm

Xavier wrote:

There has never been any real official interests for delta. This seems
to make a requirement the ability to make a separate delta server.
This seems to require a separate delta database. This implies a new
level of complexity and code bloat in pacman. Now maybe it is worth
it, I don't know, it still makes me wondering why we put all this
delta stuff in pacman to begin with. What was the problem with
XferCommand, it seemed like it was a great idea. Now that
wget-xdelta.sh script is just a toy, but a much more powerful python
script could be written that has basically the same logic as pacman
currently has + the ability to fetch and parse a separate delta
database.
Unless the server is out of disk space, I'm not too sure exactly why
there's a requirement for a separate server. If pacman is distributed
with the delta option turned on by default, the server doing the actual
"serving" of the updates is probably going to have 60 to 85% less work
to do.


I will grant that there would be a new level of complexity involved, for
example, if I've missed 4 updates, we'd have to "chain link" the tar.gz
in my cache via 4 delta patches to get the current tar.gz.


I believe that the following would be the simplest implementation both
in terms of how much implementation work is needed and the probable
effectiveness:
Put delta files into a separate folder (thus also avoiding a snapshot
from containing the deltas):

http://archlinux.mirror.ac.za/delta/core/os/x86_64/kernel26-2.6.28.4-1-x86_64.kernel26-2.6.28.5-1.pkg.xd3.tar.gz
Thus, I could do the following (bash pseudocode)
curl http://archlinux.mirror.ac.za/delta/core/os/x86_64/ > tmpfile
grep $pkgname < tmpfile > listing
failed=false
cat listing | while read delta
do
[ $pkgname-$currentpkgversion-$pkgarch.xd3.tar.gz *within* $delta ] &&
start=true

if [ start=true ]
then while read delta
do
wget http://archlinux.mirror.ac.za/delta/core/os/x86_64/$delta &&
applydelta $delta $curfile

[ $output=$pkgname-$newpkgversion-$pkgarch.tar.gz ] && break
curfile=`ls -rt | tail -n 1`
done
fi
[ $output=$pkgname-$newpkgversion-$pkgarch.tar.gz ] && break
done

The above requires no db implementation at all and can work well even
using the above very simple logic.

And yes, by my own standards, the above is very bad bash pseudo-code. :P

Of the above, what is already implemented in pacman?

__________
Brendan Hide

_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://www.archlinux.org/mailman/listinfo/pacman-dev
 

Thread Tools




All times are GMT. The time now is 01:22 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org