FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 09-26-2012, 08:30 PM
Michael Mol
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

A few months ago, I filed bug 423651 to ask that bzip2 on the install
media be replaced with
pbzip2. It was closed a short while later, telling me that it'd
involve changing what's kept in @system, and that had to be discussed
here, rather than in a bug report.

Here's a detailed description of how pbzip2 operates, as described by
a friend of mine:

> pbzip2's compression routine splits the input into blocks (with a default of 900,000
> bytes), which it then feeds into the standard bzip2 compression routine. The output
> of the various calls to the bzip2 compression routine are then concatenated together.
> The end result is the same as if you had first used the "split" command on the input,
> run individual bzip2 commands on the split pieces, then recombined the individual
> bz2 files using cat.
>
> The down side to this is that you have multiple file headers, footers, and byte-align
> padding, plus the fact that bzip2 does a RLE compression stage to fill the buffer it
> feeds to the BWT, the main part of the compression routine. If you happen to have a
> section with 1MiB of the same byte, the pbzip2 front-end will split that into two blocks
> (at the default settings) and feed them to separate bzip2 compressors. bzip2 will
> then compress the first block down to a buffer of about 17kiB before passing it on
> to be compressed further, and the rest of the data would have fit within this block, if
> pbzip2 hadn't split it the way it had.
>
> As for decompression, pbzip2 can only really do parallel decompression of files that it
> created, since it seeks for the bz2 file header in order to split it to different workers. One
> reason for this is that the bz2 block header is not byte aligned.

I really don't know how to carry this discussion any further than
this; I'll answer any questions I can.

--
:wq
 
Old 09-26-2012, 08:43 PM
Matt Turner
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

On Wed, Sep 26, 2012 at 1:30 PM, Michael Mol <mikemol@gmail.com> wrote:
> A few months ago, I filed bug 423651 to ask that bzip2 on the install
> media be replaced with
> pbzip2. It was closed a short while later, telling me that it'd
> involve changing what's kept in @system, and that had to be discussed
> here, rather than in a bug report.

If we're going to ship a parallel bzip2 implementation, it should be
lbzip2 and not pbzip2.

lbzip2 can decompress bz2 archives with multiple threads that haven't
been compressed with lbzip2/pbzip2.
 
Old 09-26-2012, 09:27 PM
Florian Philipp
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

Am 26.09.2012 22:43, schrieb Matt Turner:
> On Wed, Sep 26, 2012 at 1:30 PM, Michael Mol <mikemol@gmail.com> wrote:
>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>> media be replaced with
>> pbzip2. It was closed a short while later, telling me that it'd
>> involve changing what's kept in @system, and that had to be discussed
>> here, rather than in a bug report.
>
> If we're going to ship a parallel bzip2 implementation, it should be
> lbzip2 and not pbzip2.
>
> lbzip2 can decompress bz2 archives with multiple threads that haven't
> been compressed with lbzip2/pbzip2.
>

This seems relevant, especially comment 12ff:
https://bugs.gentoo.org/show_bug.cgi?id=309683

For further anecdotal evidence: I've used pbzip2 with USE="symlink" for
several months now and never had trouble with it. Checking out lbzip2
now. I noticed it doesn't install a bunzip2 symlink.

Regards,
Florian Philipp
 
Old 09-26-2012, 09:53 PM
Michael Mol
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

On Wed, Sep 26, 2012 at 5:27 PM, Florian Philipp <lists@binarywings.net> wrote:
> Am 26.09.2012 22:43, schrieb Matt Turner:
>> On Wed, Sep 26, 2012 at 1:30 PM, Michael Mol <mikemol@gmail.com> wrote:
>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>> media be replaced with
>>> pbzip2. It was closed a short while later, telling me that it'd
>>> involve changing what's kept in @system, and that had to be discussed
>>> here, rather than in a bug report.
>>
>> If we're going to ship a parallel bzip2 implementation, it should be
>> lbzip2 and not pbzip2.
>>
>> lbzip2 can decompress bz2 archives with multiple threads that haven't
>> been compressed with lbzip2/pbzip2.
>>
>
> This seems relevant, especially comment 12ff:
> https://bugs.gentoo.org/show_bug.cgi?id=309683
>
> For further anecdotal evidence: I've used pbzip2 with USE="symlink" for
> several months now and never had trouble with it. Checking out lbzip2
> now. I noticed it doesn't install a bunzip2 symlink.

Piotr Szymaniak asked me about lbzip2, and I bounced the question over
to my friend. He didn't investigate it deeply; it crashed (OOM or
something else, I don't know) when he tried it on a large file. Could
have been from 2GB to 2TB, from what he has laying around. I don't
know; I didn't get that one in writing.

But if it proves to be stable for small and very large files, I'd have
no complaint.

--
:wq
 
Old 09-26-2012, 09:59 PM
Michael Mol
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

On Wed, Sep 26, 2012 at 5:49 PM, Ch*-Thanh Christopher Nguyễn
<chithanh@gentoo.org> wrote:
> Michael Mol schrieb:
>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>> media be replaced with
>> pbzip2.
>
> If I understand correctly, pbzip2 depends on bzip2. So what you are
> asking is that pbzip2 is preferred over bzip2 when both are installed,
> and that pbzip2 is installed by default?

pbzip2 uses libbzip2, which I understand bzip2 to also be a wrapper around.

>
> I have so far encountered only one anecdotal case in #gentoo IRC where
> pbzip2[symlink] caused problems in emerging a package. Disabling the
> symlink flag made the problem go away. However I can't point to the
> report right now, maybe someone with searchable backlog can uncover it.

pbzip2[symlink] is more or less the scenario I'd like to see.

>
> A different question is whether in the cases where parallel bzip2 makes
> sense, is it really the best solution? xz is outperforming bzip2's
> compression ratio for large files (for an informal comparison, see bug
> 434350). And xz is faster at decompression, which offsets the parallel
> advantage to some degree.

xz is faster for decompression, by my inspiration case was during
system installation, so compression. Last I looked, xz was still very
slow for threaded compression. And it's not block-oriented, so I don't
think that's really possible without loss of compression efficiency,
anyway...and my use cases range from 4-8 physical cores.

--
:wq
 
Old 09-26-2012, 10:31 PM
Mike Gilbert
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

On Wed, Sep 26, 2012 at 5:59 PM, Michael Mol <mikemol@gmail.com> wrote:
> On Wed, Sep 26, 2012 at 5:49 PM, Ch*-Thanh Christopher Nguyễn
> <chithanh@gentoo.org> wrote:
>> Michael Mol schrieb:
>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>> media be replaced with
>>> pbzip2.
>>
>> If I understand correctly, pbzip2 depends on bzip2. So what you are
>> asking is that pbzip2 is preferred over bzip2 when both are installed,
>> and that pbzip2 is installed by default?
>
> pbzip2 uses libbzip2, which I understand bzip2 to also be a wrapper around.
>

libbz2 is built and installed by the app-arch/bzip2 package. Thus,
app-arch/pbzip2 depends on app-arch/bzip2, unless someone rips libbz2
out into a separate ebuild.
 
Old 09-26-2012, 10:57 PM
Christoph Junghans
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

2012/9/26 Mike Gilbert <floppym@gentoo.org>:
> On Wed, Sep 26, 2012 at 5:59 PM, Michael Mol <mikemol@gmail.com> wrote:
>> On Wed, Sep 26, 2012 at 5:49 PM, Ch*-Thanh Christopher Nguyễn
>> <chithanh@gentoo.org> wrote:
>>> Michael Mol schrieb:
>>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>>> media be replaced with
>>>> pbzip2.
>>>
>>> If I understand correctly, pbzip2 depends on bzip2. So what you are
>>> asking is that pbzip2 is preferred over bzip2 when both are installed,
>>> and that pbzip2 is installed by default?
>>
>> pbzip2 uses libbzip2, which I understand bzip2 to also be a wrapper around.
>>
>
> libbz2 is built and installed by the app-arch/bzip2 package. Thus,
> app-arch/pbzip2 depends on app-arch/bzip2, unless someone rips libbz2
> out into a separate ebuild.
That sound like a plan. Maybe bzip2 should become a virtual as busybox
also provides an implementation.



--
Christoph Junghans
http://dev.gentoo.org/~ottxor/
 
Old 09-26-2012, 11:57 PM
Michael Mol
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

On Wed, Sep 26, 2012 at 6:57 PM, Christoph Junghans <ottxor@gentoo.org> wrote:
> 2012/9/26 Mike Gilbert <floppym@gentoo.org>:
>> On Wed, Sep 26, 2012 at 5:59 PM, Michael Mol <mikemol@gmail.com> wrote:
>>> On Wed, Sep 26, 2012 at 5:49 PM, Ch*-Thanh Christopher Nguyễn
>>> <chithanh@gentoo.org> wrote:
>>>> Michael Mol schrieb:
>>>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>>>> media be replaced with
>>>>> pbzip2.
>>>>
>>>> If I understand correctly, pbzip2 depends on bzip2. So what you are
>>>> asking is that pbzip2 is preferred over bzip2 when both are installed,
>>>> and that pbzip2 is installed by default?
>>>
>>> pbzip2 uses libbzip2, which I understand bzip2 to also be a wrapper around.
>>>
>>
>> libbz2 is built and installed by the app-arch/bzip2 package. Thus,
>> app-arch/pbzip2 depends on app-arch/bzip2, unless someone rips libbz2
>> out into a separate ebuild.
> That sound like a plan. Maybe bzip2 should become a virtual as busybox
> also provides an implementation.

This makes sense. And going back to my initial issue, I don't really
care which implementation gets used on the bootable media, so long as
it supports scaling to use my CPU cores.

--
:wq
 
Old 09-27-2012, 12:55 AM
Diego Elio Pettenò
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

On 26/09/2012 15:57, Christoph Junghans wrote:
> That sound like a plan. Maybe bzip2 should become a virtual as busybox
> also provides an implementation.

No, just, no.

--
Diego Elio Pettenò — Flameeyes
flameeyes@flameeyes.eu — http://blog.flameeyes.eu/
 
Old 09-27-2012, 07:22 AM
Florian Philipp
 
Default ship app-arch/pbzip2 instead of app-arch/bzip2

Am 26.09.2012 23:53, schrieb Michael Mol:
> On Wed, Sep 26, 2012 at 5:27 PM, Florian Philipp <lists@binarywings.net> wrote:
>> Am 26.09.2012 22:43, schrieb Matt Turner:
>>> On Wed, Sep 26, 2012 at 1:30 PM, Michael Mol <mikemol@gmail.com> wrote:
>>>> A few months ago, I filed bug 423651 to ask that bzip2 on the install
>>>> media be replaced with
>>>> pbzip2. It was closed a short while later, telling me that it'd
>>>> involve changing what's kept in @system, and that had to be discussed
>>>> here, rather than in a bug report.
>>>
>>> If we're going to ship a parallel bzip2 implementation, it should be
>>> lbzip2 and not pbzip2.
>>>
>>> lbzip2 can decompress bz2 archives with multiple threads that haven't
>>> been compressed with lbzip2/pbzip2.
>>>
>>
>> This seems relevant, especially comment 12ff:
>> https://bugs.gentoo.org/show_bug.cgi?id=309683
>>
>> For further anecdotal evidence: I've used pbzip2 with USE="symlink" for
>> several months now and never had trouble with it. Checking out lbzip2
>> now. I noticed it doesn't install a bunzip2 symlink.
>
> Piotr Szymaniak asked me about lbzip2, and I bounced the question over
> to my friend. He didn't investigate it deeply; it crashed (OOM or
> something else, I don't know) when he tried it on a large file. Could
> have been from 2GB to 2TB, from what he has laying around. I don't
> know; I didn't get that one in writing.
>
> But if it proves to be stable for small and very large files, I'd have
> no complaint.
>

I just encountered this:

bzip2 -c </srv/qemu/hpwin.img >/dev/null
bzip2:
/var/tmp/portage/app-arch/lbzip2-2.2/work/lbzip2-2.2/src/encode.c:794:
generate_initial_trees: Assertion `a < b' failed.

Something in that file is upsetting lbzip2. I'm investigating.
 

Thread Tools




All times are GMT. The time now is 11:39 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org