Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Gentoo Development (http://www.linux-archive.org/gentoo-development/)
-   -   ship app-arch/pbzip2 instead of app-arch/bzip2 (http://www.linux-archive.org/gentoo-development/707644-ship-app-arch-pbzip2-instead-app-arch-bzip2.html)

Michael Mol 09-26-2012 08:30 PM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
A few months ago, I filed bug 423651 to ask that bzip2 on the install
media be replaced with
pbzip2. It was closed a short while later, telling me that it'd
involve changing what's kept in @system, and that had to be discussed
here, rather than in a bug report.

Here's a detailed description of how pbzip2 operates, as described by
a friend of mine:

> pbzip2's compression routine splits the input into blocks (with a default of 900,000
> bytes), which it then feeds into the standard bzip2 compression routine. The output
> of the various calls to the bzip2 compression routine are then concatenated together.
> The end result is the same as if you had first used the "split" command on the input,
> run individual bzip2 commands on the split pieces, then recombined the individual
> bz2 files using cat.
>
> The down side to this is that you have multiple file headers, footers, and byte-align
> padding, plus the fact that bzip2 does a RLE compression stage to fill the buffer it
> feeds to the BWT, the main part of the compression routine. If you happen to have a
> section with 1MiB of the same byte, the pbzip2 front-end will split that into two blocks
> (at the default settings) and feed them to separate bzip2 compressors. bzip2 will
> then compress the first block down to a buffer of about 17kiB before passing it on
> to be compressed further, and the rest of the data would have fit within this block, if
> pbzip2 hadn't split it the way it had.
>
> As for decompression, pbzip2 can only really do parallel decompression of files that it
> created, since it seeks for the bz2 file header in order to split it to different workers. One
> reason for this is that the bz2 block header is not byte aligned.

I really don't know how to carry this discussion any further than
this; I'll answer any questions I can.

--
:wq

Matt Turner 09-26-2012 08:43 PM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
On Wed, Sep 26, 2012 at 1:30 PM, Michael Mol <mikemol@gmail.com> wrote:
> A few months ago, I filed bug 423651 to ask that bzip2 on the install
> media be replaced with
> pbzip2. It was closed a short while later, telling me that it'd
> involve changing what's kept in @system, and that had to be discussed
> here, rather than in a bug report.

If we're going to ship a parallel bzip2 implementation, it should be
lbzip2 and not pbzip2.

lbzip2 can decompress bz2 archives with multiple threads that haven't
been compressed with lbzip2/pbzip2.

Florian Philipp 09-26-2012 09:27 PM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
Am 26.09.2012 22:43, schrieb Matt Turner:
> On Wed, Sep 26, 2012 at 1:30 PM, Michael Mol <mikemol@gmail.com> wrote:
>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>> media be replaced with
>> pbzip2. It was closed a short while later, telling me that it'd
>> involve changing what's kept in @system, and that had to be discussed
>> here, rather than in a bug report.
>
> If we're going to ship a parallel bzip2 implementation, it should be
> lbzip2 and not pbzip2.
>
> lbzip2 can decompress bz2 archives with multiple threads that haven't
> been compressed with lbzip2/pbzip2.
>

This seems relevant, especially comment 12ff:
https://bugs.gentoo.org/show_bug.cgi?id=309683

For further anecdotal evidence: I've used pbzip2 with USE="symlink" for
several months now and never had trouble with it. Checking out lbzip2
now. I noticed it doesn't install a bunzip2 symlink.

Regards,
Florian Philipp

Michael Mol 09-26-2012 09:53 PM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
On Wed, Sep 26, 2012 at 5:27 PM, Florian Philipp <lists@binarywings.net> wrote:
> Am 26.09.2012 22:43, schrieb Matt Turner:
>> On Wed, Sep 26, 2012 at 1:30 PM, Michael Mol <mikemol@gmail.com> wrote:
>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>> media be replaced with
>>> pbzip2. It was closed a short while later, telling me that it'd
>>> involve changing what's kept in @system, and that had to be discussed
>>> here, rather than in a bug report.
>>
>> If we're going to ship a parallel bzip2 implementation, it should be
>> lbzip2 and not pbzip2.
>>
>> lbzip2 can decompress bz2 archives with multiple threads that haven't
>> been compressed with lbzip2/pbzip2.
>>
>
> This seems relevant, especially comment 12ff:
> https://bugs.gentoo.org/show_bug.cgi?id=309683
>
> For further anecdotal evidence: I've used pbzip2 with USE="symlink" for
> several months now and never had trouble with it. Checking out lbzip2
> now. I noticed it doesn't install a bunzip2 symlink.

Piotr Szymaniak asked me about lbzip2, and I bounced the question over
to my friend. He didn't investigate it deeply; it crashed (OOM or
something else, I don't know) when he tried it on a large file. Could
have been from 2GB to 2TB, from what he has laying around. I don't
know; I didn't get that one in writing. :)

But if it proves to be stable for small and very large files, I'd have
no complaint. :)

--
:wq

Michael Mol 09-26-2012 09:59 PM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
On Wed, Sep 26, 2012 at 5:49 PM, Ch*-Thanh Christopher Nguyễn
<chithanh@gentoo.org> wrote:
> Michael Mol schrieb:
>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>> media be replaced with
>> pbzip2.
>
> If I understand correctly, pbzip2 depends on bzip2. So what you are
> asking is that pbzip2 is preferred over bzip2 when both are installed,
> and that pbzip2 is installed by default?

pbzip2 uses libbzip2, which I understand bzip2 to also be a wrapper around.

>
> I have so far encountered only one anecdotal case in #gentoo IRC where
> pbzip2[symlink] caused problems in emerging a package. Disabling the
> symlink flag made the problem go away. However I can't point to the
> report right now, maybe someone with searchable backlog can uncover it.

pbzip2[symlink] is more or less the scenario I'd like to see.

>
> A different question is whether in the cases where parallel bzip2 makes
> sense, is it really the best solution? xz is outperforming bzip2's
> compression ratio for large files (for an informal comparison, see bug
> 434350). And xz is faster at decompression, which offsets the parallel
> advantage to some degree.

xz is faster for decompression, by my inspiration case was during
system installation, so compression. Last I looked, xz was still very
slow for threaded compression. And it's not block-oriented, so I don't
think that's really possible without loss of compression efficiency,
anyway...and my use cases range from 4-8 physical cores.

--
:wq

Mike Gilbert 09-26-2012 10:31 PM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
On Wed, Sep 26, 2012 at 5:59 PM, Michael Mol <mikemol@gmail.com> wrote:
> On Wed, Sep 26, 2012 at 5:49 PM, Ch*-Thanh Christopher Nguyễn
> <chithanh@gentoo.org> wrote:
>> Michael Mol schrieb:
>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>> media be replaced with
>>> pbzip2.
>>
>> If I understand correctly, pbzip2 depends on bzip2. So what you are
>> asking is that pbzip2 is preferred over bzip2 when both are installed,
>> and that pbzip2 is installed by default?
>
> pbzip2 uses libbzip2, which I understand bzip2 to also be a wrapper around.
>

libbz2 is built and installed by the app-arch/bzip2 package. Thus,
app-arch/pbzip2 depends on app-arch/bzip2, unless someone rips libbz2
out into a separate ebuild.

Christoph Junghans 09-26-2012 10:57 PM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
2012/9/26 Mike Gilbert <floppym@gentoo.org>:
> On Wed, Sep 26, 2012 at 5:59 PM, Michael Mol <mikemol@gmail.com> wrote:
>> On Wed, Sep 26, 2012 at 5:49 PM, Ch*-Thanh Christopher Nguyễn
>> <chithanh@gentoo.org> wrote:
>>> Michael Mol schrieb:
>>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>>> media be replaced with
>>>> pbzip2.
>>>
>>> If I understand correctly, pbzip2 depends on bzip2. So what you are
>>> asking is that pbzip2 is preferred over bzip2 when both are installed,
>>> and that pbzip2 is installed by default?
>>
>> pbzip2 uses libbzip2, which I understand bzip2 to also be a wrapper around.
>>
>
> libbz2 is built and installed by the app-arch/bzip2 package. Thus,
> app-arch/pbzip2 depends on app-arch/bzip2, unless someone rips libbz2
> out into a separate ebuild.
That sound like a plan. Maybe bzip2 should become a virtual as busybox
also provides an implementation.



--
Christoph Junghans
http://dev.gentoo.org/~ottxor/

Michael Mol 09-26-2012 11:57 PM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
On Wed, Sep 26, 2012 at 6:57 PM, Christoph Junghans <ottxor@gentoo.org> wrote:
> 2012/9/26 Mike Gilbert <floppym@gentoo.org>:
>> On Wed, Sep 26, 2012 at 5:59 PM, Michael Mol <mikemol@gmail.com> wrote:
>>> On Wed, Sep 26, 2012 at 5:49 PM, Ch*-Thanh Christopher Nguyễn
>>> <chithanh@gentoo.org> wrote:
>>>> Michael Mol schrieb:
>>>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>>>> media be replaced with
>>>>> pbzip2.
>>>>
>>>> If I understand correctly, pbzip2 depends on bzip2. So what you are
>>>> asking is that pbzip2 is preferred over bzip2 when both are installed,
>>>> and that pbzip2 is installed by default?
>>>
>>> pbzip2 uses libbzip2, which I understand bzip2 to also be a wrapper around.
>>>
>>
>> libbz2 is built and installed by the app-arch/bzip2 package. Thus,
>> app-arch/pbzip2 depends on app-arch/bzip2, unless someone rips libbz2
>> out into a separate ebuild.
> That sound like a plan. Maybe bzip2 should become a virtual as busybox
> also provides an implementation.

This makes sense. And going back to my initial issue, I don't really
care which implementation gets used on the bootable media, so long as
it supports scaling to use my CPU cores.

--
:wq

Diego Elio Pettenò 09-27-2012 12:55 AM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
On 26/09/2012 15:57, Christoph Junghans wrote:
> That sound like a plan. Maybe bzip2 should become a virtual as busybox
> also provides an implementation.

No, just, no.

--
Diego Elio Pettenò — Flameeyes
flameeyes@flameeyes.eu — http://blog.flameeyes.eu/

Florian Philipp 09-27-2012 07:22 AM

ship app-arch/pbzip2 instead of app-arch/bzip2
 
Am 26.09.2012 23:53, schrieb Michael Mol:
> On Wed, Sep 26, 2012 at 5:27 PM, Florian Philipp <lists@binarywings.net> wrote:
>> Am 26.09.2012 22:43, schrieb Matt Turner:
>>> On Wed, Sep 26, 2012 at 1:30 PM, Michael Mol <mikemol@gmail.com> wrote:
>>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>>> media be replaced with
>>>> pbzip2. It was closed a short while later, telling me that it'd
>>>> involve changing what's kept in @system, and that had to be discussed
>>>> here, rather than in a bug report.
>>>
>>> If we're going to ship a parallel bzip2 implementation, it should be
>>> lbzip2 and not pbzip2.
>>>
>>> lbzip2 can decompress bz2 archives with multiple threads that haven't
>>> been compressed with lbzip2/pbzip2.
>>>
>>
>> This seems relevant, especially comment 12ff:
>> https://bugs.gentoo.org/show_bug.cgi?id=309683
>>
>> For further anecdotal evidence: I've used pbzip2 with USE="symlink" for
>> several months now and never had trouble with it. Checking out lbzip2
>> now. I noticed it doesn't install a bunzip2 symlink.
>
> Piotr Szymaniak asked me about lbzip2, and I bounced the question over
> to my friend. He didn't investigate it deeply; it crashed (OOM or
> something else, I don't know) when he tried it on a large file. Could
> have been from 2GB to 2TB, from what he has laying around. I don't
> know; I didn't get that one in writing. :)
>
> But if it proves to be stable for small and very large files, I'd have
> no complaint. :)
>

I just encountered this:

bzip2 -c </srv/qemu/hpwin.img >/dev/null
bzip2:
/var/tmp/portage/app-arch/lbzip2-2.2/work/lbzip2-2.2/src/encode.c:794:
generate_initial_trees: Assertion `a < b' failed.

Something in that file is upsetting lbzip2. I'm investigating.


All times are GMT. The time now is 02:27 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.