FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 08-24-2012, 09:04 AM
Gaël DONVAL
 
Default compressor

Le jeudi 23 août 2012 à 20:24 +0800, lina a écrit :
>
> Sorry, here you mean,
>
> once tar -Jcf a.tar.xz a
>
> again
> tar -Jcf a.tar.xz a.tar.xz
> ?
No, I think this was a joke

In most programs, there is a "depth" or "pass number" parameter that
does just this already. If you try to compress again, the overhead
induced by the container (headers and such) will ultimately increase the
file size.

Oh, BTW, if you don't need file permissions, just use 7zip directly on
the directories you want to compress: you will avoid tar format
overhead.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1345799096.4875.5.camel@p76-nom-gd.cnrs-imn.fr">http://lists.debian.org/1345799096.4875.5.camel@p76-nom-gd.cnrs-imn.fr
 
Old 08-24-2012, 09:10 AM
Jon Dowland
 
Default compressor

On Fri, Aug 24, 2012 at 11:04:56AM +0200, Gaël DONVAL wrote:
> Le jeudi 23 août 2012 à 20:24 +0800, lina a écrit :
> >
> > Sorry, here you mean,
> >
> > once tar -Jcf a.tar.xz a
> >
> > again
> > tar -Jcf a.tar.xz a.tar.xz
> > ?
> No, I think this was a joke

Yes it was a joke but it was based on a recent article where someone
expressed surprise that multiple manual passes of a compressor (I think
gz) resulted in smaller file sizes. (I couldn't find a copy of the article
to link to)

> In most programs, there is a "depth" or "pass number" parameter that
> does just this already. If you try to compress again, the overhead
> induced by the container (headers and such) will ultimately increase the
> file size.

Most compressors work on a block-cipher model in order to support stream
operation and so the compressor doesn't have a global view of the data being
compressed. That's why subsequent manual passes can (sometimes) have a good
effect, especially with e.g. enormous log files with a lot of repetition: local
areas of the file being compressed are treated in isolation, but the resulting
compressed blocks have a lot of (compressed!) repetition. In practise it's
almost certainly very rarely worth bothering.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120824091035.GE19780@debian
 
Old 08-24-2012, 09:11 AM
Jon Dowland
 
Default compressor

On Thu, Aug 23, 2012 at 02:26:25PM -0600, Bob Proulx wrote:
> There is a problem with the mashing and reformatting. It makes lzip
> appear to be 66M against xz being 65M and so xz is better, right?
snip
> It would be better to look at the long byte counts for this type of
> comparison.

You're right, that would be necessary to be accurate. An exercise left
for another reader!


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120824091145.GF19780@debian
 
Old 08-24-2012, 09:15 AM
Gaël DONVAL
 
Default compressor

Le jeudi 23 août 2012 à 14:26 -0600, Bob Proulx a écrit :
> Jon Dowland wrote:
> > Bob Proulx wrote:
> > > Jon Dowland wrote:
> > > > linux-3.6-rc2.tar.bz2 78M
> > > > linux-3.6-rc2.tar.gz 99M
> > > > linux-3.6-rc2.tar.xz 65M
> > > linux-3.6-rc2.tar.lz 66M
> > >
> > > I think lzip is worthy enough that it should have a mention too. It
> > > has gotten less attention than xz and that is sad since it is a nice
> > > free software tool. I recompressed that file using lzip for this
> > > comparison.
> >
> > Thanks for the data (mashed/reformatted into quote above). I copied the
> > listings from the kernel.org archives, so the choice of compression types
> > was theirs (although I hadn't heard of lzip, thanks!)
>
> There is a problem with the mashing and reformatting. It makes lzip
> appear to be 66M against xz being 65M and so xz is better, right? But
> wait the above says that gz is 99M. But ls says 100M. So the listed
> sizes are not 100% correct. So 66M is true if 100M is true. But it
> seems that something was truncating down to 99M and so perhaps that
> 65M is actually 66M? In which case xz and lz were actually the same
> for that sample. Or perhaps if they count 65M as true for xy then
> perhaps it should be 65M for lz too?
>
> I think you see the problem. I don't really know from the above data
> whether xz or lz is the same or worse or better.
>
> I didn't go and download the linux-3.6-rc2.tar.xz file to see what
> size it actually should be listed as. I probably should have. But I
> didn't have the time.
>
> It would be better to look at the long byte counts for this type of
> comparison.
>
> Bob

Even if you are perfectly right, I wouldn't look at the long byte count.
A MB today is downloaded in 1s with most internet connection and if you
take linux-2.6 archive or your whole / partition archive, you might see
that lz/xz performs worse/better that xz/lz considering file size.

>From my point of view, I see two programs performing almost equally well
on a big bunch of ascii files on this hardware.

So the next question would be "which one is faster?" and even before
that, I would wonder "Are these programs available on my cluster?"

But once again you are perfectly right to ask for more precision, I just
say that there are high chances that you won't be able to conclude
anything.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1345799759.4875.13.camel@p76-nom-gd.cnrs-imn.fr">http://lists.debian.org/1345799759.4875.13.camel@p76-nom-gd.cnrs-imn.fr
 
Old 08-24-2012, 09:31 AM
Gaël DONVAL
 
Default compressor

Le vendredi 24 août 2012 à 10:10 +0100, Jon Dowland a écrit :
>
> Most compressors work on a block-cipher model in order to support stream
> operation and so the compressor doesn't have a global view of the data being
> compressed.
At least with 7zip and xz, you can tweak the block size directly and at
least LZMA, Deflate, PPMd are able to do multiple pass.

> That's why subsequent manual passes can (sometimes) have a good
> effect, especially with e.g. enormous log files with a lot of repetition: local
> areas of the file being compressed are treated in isolation, but the resulting
> compressed blocks have a lot of (compressed!) repetition. In practise it's
> almost certainly very rarely worth bothering.
That makes sense. AFAIK, you can't manually set the block size with gzip
which is a shame for non-streamed files.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1345800669.4875.24.camel@p76-nom-gd.cnrs-imn.fr">http://lists.debian.org/1345800669.4875.24.camel@p76-nom-gd.cnrs-imn.fr
 
Old 08-24-2012, 07:28 PM
Bob Proulx
 
Default compressor

Gaël DONVAL wrote:
> Bob Proulx a écrit :
> > There is a problem with the mashing and reformatting. It makes lzip
> > appear to be 66M against xz being 65M and so xz is better, right? But
> > wait the above says that gz is 99M. But ls says 100M. So the listed
> > sizes are not 100% correct. So 66M is true if 100M is true. But it
> > seems that something was truncating down to 99M and so perhaps that
> > 65M is actually 66M? In which case xz and lz were actually the same
> > for that sample. Or perhaps if they count 65M as true for xy then
> > perhaps it should be 65M for lz too?
> >
> > I think you see the problem. I don't really know from the above data
> > whether xz or lz is the same or worse or better.
> > ...

> Even if you are perfectly right, I wouldn't look at the long byte count.
> A MB today is downloaded in 1s with most internet connection and if you

You have a faster network connection than I do. Or rather I do not
have as fast of a connection as "most people" do these days. :-) In
my case I would like something faster but in my area while this is
possible it is many times more expensive. I must wait.

> take linux-2.6 archive or your whole / partition archive, you might see
> that lz/xz performs worse/better that xz/lz considering file size.

Agreed.

> From my point of view, I see two programs performing almost equally well
> on a big bunch of ascii files on this hardware.

You are very observant! And by this you are not in the target
audience I was talking about. I know people and many people will see
66M versus 65M as a strong indicator when it should not be taken as
significant at all. These people would see 0.0000001% as being
different, strictly one is measured at larger than the other, and make
a conclusion which they should not conclude. That you observe this
correctly shows that you are smarter than these other people that I
worry about. :-)

> So the next question would be "which one is faster?" and even before
> that, I would wonder "Are these programs available on my cluster?"
>
> But once again you are perfectly right to ask for more precision, I just
> say that there are high chances that you won't be able to conclude
> anything.

Agreed. My comment was directed toward the human element. :-)

Bob
 
Old 08-25-2012, 06:43 PM
Gaël DONVAL
 
Default compressor

Le vendredi 24 août 2012 à 13:28 -0600, Bob Proulx a écrit :
> You are very observant! And by this you are not in the target
> audience I was talking about. I know people and many people will see
> 66M versus 65M as a strong indicator when it should not be taken as
> significant at all. These people would see 0.0000001% as being
> different, strictly one is measured at larger than the other, and make
> a conclusion which they should not conclude. That you observe this
> correctly shows that you are smarter than these other people that I
> worry about. :-)
Seems not: I did not understand the true meaning of your remark. I've
just reread it and it seems I skipped half of it.

The 1MB/s internet connection speed was just an order of magnitude. I
hope you are not struggling with a 56k modem



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1345920221.9884.13.camel@p76-nom-gd.cnrs-imn.fr">http://lists.debian.org/1345920221.9884.13.camel@p76-nom-gd.cnrs-imn.fr
 

Thread Tools




All times are GMT. The time now is 10:11 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org