FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux Pacman Development

 
 
LinkBack Thread Tools
 
Old 11-07-2008, 10:47 PM
"Henning Garus"
 
Default delta support in libalpm

Hi,

I have been looking through the current delta implementation in
libalpm and have put some thought into changing makepkg/repo-add to
support delta creation. However, I'm running into some problems,
mostly due to md5sums and gzip.

The current implementation works as follows. On a sync operation it is
checked, whether a valid delta path exists and if the summed filesize
of the deltas is smaller than the filesize of the whole download. When
this is the case the deltas are downloaded and applied to the old
file. After that the patched file is treated as if it was downloaded
normally, this includes a check of the md5sum. Gzip files have a
header, that has a timestamp, which will screw with this md5sum. When
a patch is applied to a gzipped file by xdelta, xdelta will unzip the
file, apply the patch and then rezip the file. The author of xdelta
was obviously aware of the problems with the timestamp, because he
decided to leave it empty. The same can be achieved by the -n option
of gzip. But there comes the next problem, xdelta uses zlib for
compression, gzip implements compression itself. And files created by
gzip can differ from files created by zlib. Bsdtar uses zlib as well,
but writes the timestamp and there is no option to prevent this (at
least none that I can see).

There are four ways around this, that I can think of:

1. create the package, then create the delta, apply the delta to the
old version, remove the original new package and present the patched
package as output

I think this sucks, this ties delta creation to makepkg (more about
that later) and has an incredibly huge and useless overhead (countless
unzips and rezips and applying the patch).

2. create the package, but don't compress it with bsdtar, use gzip -n
instead. This means we have to use gzip again, in libalpm, when we
apply the delta.

Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
sure if this is a good thing, especially for libalpm.

3. save the md5sums of the unzipped tars in the synchdb and change
libalpm to check those

Seems reasonable, but I don't see a way to do this with libarchive, so
this would require using zlib directly and pacman would lose the
ability to handle to handle tar.bz2

4. Skip checking the md5sum for deltas

OK during the initial synch, as long as we trust xdelta to do its job
(the md5sums of both the old and the new file are in the delta file).
But the created package will have the wrong md5sum and can't be used
to reinstall, etc. which makes this look like a bad idea.


In a previous mail Xavier toyed with the idea to put delta creation
into repo-add, I have given this some thought, as it seems nice in
principle, but there are drawbacks. For Arch this would mean creating
deltas on Gerolde, which seems to be fairly strained already,
according to the dev list. Furthermore this introduces some new
variables to repo-add (at least repo location and an output location)
this would be manageable, but doesn't look very nice.

Delta creation in makepkg seems somehow ok (its already in there after
all). But what I would really like is a separate tool for delta
creation, which would allow the separation of building packages and
creating deltas and setting up a separated delta server. This leaves
us with options 2 and 3 and I am not really sure, which way to go.


looking forward to your comments
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-08-2008, 12:38 PM
Nagy Gabor
 
Default delta support in libalpm

Hi,

I have been looking through the current delta implementation in
libalpm and have put some thought into changing makepkg/repo-add to
support delta creation. However, I'm running into some problems,
mostly due to md5sums and gzip.

The current implementation works as follows. On a sync operation it is
checked, whether a valid delta path exists and if the summed filesize
of the deltas is smaller than the filesize of the whole download. When
this is the case the deltas are downloaded and applied to the old
file. After that the patched file is treated as if it was downloaded
normally, this includes a check of the md5sum. Gzip files have a
header, that has a timestamp, which will screw with this md5sum. When
a patch is applied to a gzipped file by xdelta, xdelta will unzip the
file, apply the patch and then rezip the file. The author of xdelta
was obviously aware of the problems with the timestamp, because he
decided to leave it empty. The same can be achieved by the -n option
of gzip. But there comes the next problem, xdelta uses zlib for
compression, gzip implements compression itself. And files created by
gzip can differ from files created by zlib. Bsdtar uses zlib as well,
but writes the timestamp and there is no option to prevent this (at
least none that I can see).


First of all, our current delta implementation doesn't work at all
atm, see FS#12000. So any maintainer are welcome ;-)



There are four ways around this, that I can think of:

1. create the package, then create the delta, apply the delta to the
old version, remove the original new package and present the patched
package as output

I think this sucks, this ties delta creation to makepkg (more about
that later) and has an incredibly huge and useless overhead (countless
unzips and rezips and applying the patch).


-1


2. create the package, but don't compress it with bsdtar, use gzip -n
instead. This means we have to use gzip again, in libalpm, when we
apply the delta.

Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
sure if this is a good thing, especially for libalpm.


maybe. But I don't see why we should use gzip in libalpm. Iirc we
never compress things in alpm.



3. save the md5sums of the unzipped tars in the synchdb and change
libalpm to check those

Seems reasonable, but I don't see a way to do this with libarchive, so
this would require using zlib directly and pacman would lose the
ability to handle to handle tar.bz2


-1


4. Skip checking the md5sum for deltas

OK during the initial synch, as long as we trust xdelta to do its job
(the md5sums of both the old and the new file are in the delta file).
But the created package will have the wrong md5sum and can't be used
to reinstall, etc. which makes this look like a bad idea.


-1
Although, xdelta has its own md5sum mechanism, it won't help here, as
you said.



In a previous mail Xavier toyed with the idea to put delta creation
into repo-add, I have given this some thought, as it seems nice in
principle, but there are drawbacks. For Arch this would mean creating
deltas on Gerolde, which seems to be fairly strained already,
according to the dev list. Furthermore this introduces some new
variables to repo-add (at least repo location and an output location)
this would be manageable, but doesn't look very nice.


I don't even understand why we create deltas in makepkg. But if we
create deltas with repo-add, makepkg should be changed as well (the
resultant pkg.tar.gz should not contain timestamp).



Delta creation in makepkg seems somehow ok (its already in there after
all). But what I would really like is a separate tool for delta
creation, which would allow the separation of building packages and
creating deltas and setting up a separated delta server. This leaves
us with options 2 and 3 and I am not really sure, which way to go.



Bye


------------------------------------------------------
SZTE Egyetemi Konyvtar - http://www.bibl.u-szeged.hu
This message was sent using IMP: http://horde.org/imp/


_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-08-2008, 02:27 PM
"Henning Garus"
 
Default delta support in libalpm

On Sat, Nov 8, 2008 at 1:38 PM, Nagy Gabor <ngaba@bibl.u-szeged.hu> wrote:
>> Hi,
>>
>> I have been looking through the current delta implementation in
>> libalpm and have put some thought into changing makepkg/repo-add to
>> support delta creation. However, I'm running into some problems,
>> mostly due to md5sums and gzip.
>>
>> The current implementation works as follows. On a sync operation it is
>> checked, whether a valid delta path exists and if the summed filesize
>> of the deltas is smaller than the filesize of the whole download. When
>> this is the case the deltas are downloaded and applied to the old
>> file. After that the patched file is treated as if it was downloaded
>> normally, this includes a check of the md5sum. Gzip files have a
>> header, that has a timestamp, which will screw with this md5sum. When
>> a patch is applied to a gzipped file by xdelta, xdelta will unzip the
>> file, apply the patch and then rezip the file. The author of xdelta
>> was obviously aware of the problems with the timestamp, because he
>> decided to leave it empty. The same can be achieved by the -n option
>> of gzip. But there comes the next problem, xdelta uses zlib for
>> compression, gzip implements compression itself. And files created by
>> gzip can differ from files created by zlib. Bsdtar uses zlib as well,
>> but writes the timestamp and there is no option to prevent this (at
>> least none that I can see).
>
> First of all, our current delta implementation doesn't work at all atm, see
> FS#12000. So any maintainer are welcome ;-)
>

I have yet to test this, but I think this comes down to repo-add not being in
line with the current implementation, as you already pointed out in
the discussion.


>> 2. create the package, but don't compress it with bsdtar, use gzip -n
>> instead. This means we have to use gzip again, in libalpm, when we
>> apply the delta.
>>
>> Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
>> sure if this is a good thing, especially for libalpm.
>
> maybe. But I don't see why we should use gzip in libalpm. Iirc we never
> compress things in alpm.
>

Yes you do. libalpm uses system() to execute:
xdelta patch [deltafile] [oldpkg] [newpkg]

xdelta will unzip the old package, apply the patch and rezip the new package.
Due to the zlib/gzip inconsistencies the md5sum for the patched
package can differ from the md5sum of the new package, which was
zipped with gzip. Unless that line is changed to something like
xdelta patch -0 [deltafile] [oldpkg] - | gzip -cn > [newpkg]
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-10-2008, 07:38 AM
Allan McRae
 
Default delta support in libalpm

Henning Garus wrote:

On Sat, Nov 8, 2008 at 1:38 PM, Nagy Gabor <ngaba@bibl.u-szeged.hu> wrote:


Hi,

I have been looking through the current delta implementation in
libalpm and have put some thought into changing makepkg/repo-add to
support delta creation. However, I'm running into some problems,
mostly due to md5sums and gzip.

The current implementation works as follows. On a sync operation it is
checked, whether a valid delta path exists and if the summed filesize
of the deltas is smaller than the filesize of the whole download. When
this is the case the deltas are downloaded and applied to the old
file. After that the patched file is treated as if it was downloaded
normally, this includes a check of the md5sum. Gzip files have a
header, that has a timestamp, which will screw with this md5sum. When
a patch is applied to a gzipped file by xdelta, xdelta will unzip the
file, apply the patch and then rezip the file. The author of xdelta
was obviously aware of the problems with the timestamp, because he
decided to leave it empty. The same can be achieved by the -n option
of gzip. But there comes the next problem, xdelta uses zlib for
compression, gzip implements compression itself. And files created by
gzip can differ from files created by zlib. Bsdtar uses zlib as well,
but writes the timestamp and there is no option to prevent this (at
least none that I can see).


First of all, our current delta implementation doesn't work at all atm, see
FS#12000. So any maintainer are welcome ;-)




I have yet to test this, but I think this comes down to repo-add not being in
line with the current implementation, as you already pointed out in
the discussion.




2. create the package, but don't compress it with bsdtar, use gzip -n
instead. This means we have to use gzip again, in libalpm, when we
apply the delta.

Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
sure if this is a good thing, especially for libalpm.


maybe. But I don't see why we should use gzip in libalpm. Iirc we never
compress things in alpm.




Yes you do. libalpm uses system() to execute:
xdelta patch [deltafile] [oldpkg] [newpkg]

xdelta will unzip the old package, apply the patch and rezip the new package.
Due to the zlib/gzip inconsistencies the md5sum for the patched
package can differ from the md5sum of the new package, which was
zipped with gzip. Unless that line is changed to something like
xdelta patch -0 [deltafile] [oldpkg] - | gzip -cn > [newpkg]


Is any of this fixed by using the xdelta3 branch? From memory that does
not use gzip/bzip2.


Allan



_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-10-2008, 08:30 AM
Xavier
 
Default delta support in libalpm

On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus
<henning.garus@googlemail.com> wrote:
> Hi,
>
> I have been looking through the current delta implementation in
> libalpm and have put some thought into changing makepkg/repo-add to
> support delta creation. However, I'm running into some problems,
> mostly due to md5sums and gzip.
>
> The current implementation works as follows. On a sync operation it is
> checked, whether a valid delta path exists and if the summed filesize
> of the deltas is smaller than the filesize of the whole download. When
> this is the case the deltas are downloaded and applied to the old
> file. After that the patched file is treated as if it was downloaded
> normally, this includes a check of the md5sum. Gzip files have a
> header, that has a timestamp, which will screw with this md5sum. When
> a patch is applied to a gzipped file by xdelta, xdelta will unzip the
> file, apply the patch and then rezip the file. The author of xdelta
> was obviously aware of the problems with the timestamp, because he
> decided to leave it empty. The same can be achieved by the -n option
> of gzip. But there comes the next problem, xdelta uses zlib for
> compression, gzip implements compression itself. And files created by
> gzip can differ from files created by zlib. Bsdtar uses zlib as well,
> but writes the timestamp and there is no option to prevent this (at
> least none that I can see).
>
> There are four ways around this, that I can think of:
>
> 1. create the package, then create the delta, apply the delta to the
> old version, remove the original new package and present the patched
> package as output
>
> I think this sucks, this ties delta creation to makepkg (more about
> that later) and has an incredibly huge and useless overhead (countless
> unzips and rezips and applying the patch).
>
> 2. create the package, but don't compress it with bsdtar, use gzip -n
> instead. This means we have to use gzip again, in libalpm, when we
> apply the delta.
>
> Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
> sure if this is a good thing, especially for libalpm.
>
> 3. save the md5sums of the unzipped tars in the synchdb and change
> libalpm to check those
>
> Seems reasonable, but I don't see a way to do this with libarchive, so
> this would require using zlib directly and pacman would lose the
> ability to handle to handle tar.bz2
>
> 4. Skip checking the md5sum for deltas
>
> OK during the initial synch, as long as we trust xdelta to do its job
> (the md5sums of both the old and the new file are in the delta file).
> But the created package will have the wrong md5sum and can't be used
> to reinstall, etc. which makes this look like a bad idea.
>
>
> In a previous mail Xavier toyed with the idea to put delta creation
> into repo-add, I have given this some thought, as it seems nice in
> principle, but there are drawbacks. For Arch this would mean creating
> deltas on Gerolde, which seems to be fairly strained already,
> according to the dev list. Furthermore this introduces some new
> variables to repo-add (at least repo location and an output location)
> this would be manageable, but doesn't look very nice.
>
> Delta creation in makepkg seems somehow ok (its already in there after
> all). But what I would really like is a separate tool for delta
> creation, which would allow the separation of building packages and
> creating deltas and setting up a separated delta server. This leaves
> us with options 2 and 3 and I am not really sure, which way to go.
>
>
> looking forward to your comments

I am very glad you looked into this, you seem to have a very good
understanding of the situation, possibly better than me, so it would
be great if you could fix and maintain this part.

I would just go with option 2. When deltas are used, libalpm already
relies on xdelta, so why not on gzip as well.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-13-2008, 08:31 PM
"Henning Garus"
 
Default delta support in libalpm

On Mon, Nov 10, 2008 at 8:38 AM, Allan McRae <allan@archlinux.org> wrote:
> Henning Garus wrote:
>> Yes you do. libalpm uses system() to execute:
>> xdelta patch [deltafile] [oldpkg] [newpkg]
>>
>> xdelta will unzip the old package, apply the patch and rezip the new
>> package.
>> Due to the zlib/gzip inconsistencies the md5sum for the patched
>> package can differ from the md5sum of the new package, which was
>> zipped with gzip. Unless that line is changed to something like
>> xdelta patch -0 [deltafile] [oldpkg] - | gzip -cn > [newpkg]
>
> Is any of this fixed by using the xdelta3 branch? From memory that does not
> use gzip/bzip2.

According to http://xdelta.org/xdelta3.html xdelta3 uses a builtin
compression to compress the delta files (xdelta1 uses zlib). However,
you won't get around decompression and recompression when using deltas
with compressed files. When xdelta3 gets compressed files as input it
will use the appropriate external compression engine to decompress the
inputs and compute a delta. It do that again to compress the output
after patching. So basically it will do the same, as my original
proposal 2, only internally. It could be interesting nonetheless,
because with xdelta3 deltas would probably work for bzip2 compressed
packages, without any further changes in pacman.

Henning
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-13-2008, 11:03 PM
"Henning Garus"
 
Default delta support in libalpm

On Thu, Nov 13, 2008 at 9:31 PM, Henning Garus
<henning.garus@googlemail.com> wrote:
> On Mon, Nov 10, 2008 at 8:38 AM, Allan McRae <allan@archlinux.org> wrote:
>> Henning Garus wrote:
>>> Yes you do. libalpm uses system() to execute:
>>> xdelta patch [deltafile] [oldpkg] [newpkg]
>>>
>>> xdelta will unzip the old package, apply the patch and rezip the new
>>> package.
>>> Due to the zlib/gzip inconsistencies the md5sum for the patched
>>> package can differ from the md5sum of the new package, which was
>>> zipped with gzip. Unless that line is changed to something like
>>> xdelta patch -0 [deltafile] [oldpkg] - | gzip -cn > [newpkg]
>>
>> Is any of this fixed by using the xdelta3 branch? From memory that does not
>> use gzip/bzip2.
>
> According to http://xdelta.org/xdelta3.html xdelta3 uses a builtin
> compression to compress the delta files (xdelta1 uses zlib). However,
> you won't get around decompression and recompression when using deltas
> with compressed files. When xdelta3 gets compressed files as input it
> will use the appropriate external compression engine to decompress the
> inputs and compute a delta. It do that again to compress the output
> after patching. So basically it will do the same, as my original
> proposal 2, only internally. It could be interesting nonetheless,
> because with xdelta3 deltas would probably work for bzip2 compressed
> packages, without any further changes in pacman.
>
> Henning
>

I guess I spoke to soon. I was right concerning xdelta3 using gzip for
handling gzipped files, however it doesn't use the -n flag. This gives
us the same behaviour as xdelta1, with one minor difference: Method 1.
stops working.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-14-2008, 11:02 AM
Nagy Gabor
 
Default delta support in libalpm

I guess I spoke to soon. I was right concerning xdelta3 using gzip for
handling gzipped files, however it doesn't use the -n flag. This gives
us the same behaviour as xdelta1, with one minor difference: Method 1.
stops working.


Wait. I don't understand something. If it doesn't use the -n flag, how
can we produce an md5sum-identical patched file? (The mtime is
unpredictable.) This poses an extra problem, or not? I just did an
effective test on xdelta3-diffing .tar.gz files and I saw that the
patched md5sum indeed differ from the original one :-( Maybe we should
search for a gzip header manipulation tool...


Bye


------------------------------------------------------
SZTE Egyetemi Konyvtar - http://www.bibl.u-szeged.hu
This message was sent using IMP: http://horde.org/imp/


_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-14-2008, 11:23 AM
Xavier
 
Default delta support in libalpm

On Fri, Nov 14, 2008 at 1:02 PM, Nagy Gabor <ngaba@bibl.u-szeged.hu> wrote:
>>
>> I guess I spoke to soon. I was right concerning xdelta3 using gzip for
>> handling gzipped files, however it doesn't use the -n flag. This gives
>> us the same behaviour as xdelta1, with one minor difference: Method 1.
>> stops working.
>
> Wait. I don't understand something. If it doesn't use the -n flag, how can
> we produce an md5sum-identical patched file? (The mtime is unpredictable.)
> This poses an extra problem, or not? I just did an effective test on
> xdelta3-diffing .tar.gz files and I saw that the patched md5sum indeed
> differ from the original one :-( Maybe we should search for a gzip header
> manipulation tool...
>

One of the proposal was to use gzip -n, which should fix this problem.
_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 
Old 11-14-2008, 11:30 AM
Nagy Gabor
 
Default delta support in libalpm

Idézet Xavier <shiningxc@gmail.com>:


On Fri, Nov 14, 2008 at 1:02 PM, Nagy Gabor <ngaba@bibl.u-szeged.hu> wrote:


I guess I spoke to soon. I was right concerning xdelta3 using gzip for
handling gzipped files, however it doesn't use the -n flag. This gives
us the same behaviour as xdelta1, with one minor difference: Method 1.
stops working.


Wait. I don't understand something. If it doesn't use the -n flag, how can
we produce an md5sum-identical patched file? (The mtime is unpredictable.)
This poses an extra problem, or not? I just did an effective test on
xdelta3-diffing .tar.gz files and I saw that the patched md5sum indeed
differ from the original one :-( Maybe we should search for a gzip header
manipulation tool...



One of the proposal was to use gzip -n, which should fix this problem.


xdelta1 used(?) "gzip -n" for _patched_ .tar.gz file, xdelta3 doesn't
use "-n", so setting "-n" for the original .tar.gz file won't help any
more. Probably xdelta3 also can be configured or patched to use
"-n"... Or I may completely misunderstood something...


Bye


------------------------------------------------------
SZTE Egyetemi Konyvtar - http://www.bibl.u-szeged.hu
This message was sent using IMP: http://horde.org/imp/


_______________________________________________
pacman-dev mailing list
pacman-dev@archlinux.org
http://archlinux.org/mailman/listinfo/pacman-dev
 

Thread Tools




All times are GMT. The time now is 08:09 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org