So I was asked to look at compression on log servers and to see if
changing to xz would save us some space. My test is not comprehensive
but showed what might happen.
Basic summary. XZ may save us up to 2% over what we are currently
saving but its real advantage is in speed of uncompressing files over
bzip2. [compression may be faster for some files also.]
--
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
08-26-2010, 11:44 PM
Mike McGrath
Compressing files (gz versus bz2 versus xz)
On Thu, 26 Aug 2010, Stephen John Smoogen wrote:
> So I was asked to look at compression on log servers and to see if
> changing to xz would save us some space. My test is not comprehensive
> but showed what might happen.
>
> Basic summary. XZ may save us up to 2% over what we are currently
> saving but its real advantage is in speed of uncompressing files over
> bzip2. [compression may be faster for some files also.]
>
> File | Size | Gzip | G% | Bunzip2 | B% | XZ | X%
> messages.log | 644568 | 10992 | 98.3 | 4856 | 99.3 | 5940 | 99.1
> mail.log | 610816 | 65060 | 89.3 | 40836 | 93.3 | 35536 | 94.5
> TOTAL | 1255384 | 76052 | 93.5 | 45692 | 96.1 | 41476 | 96.5
>
> Program | Compression Time | Uncompression Time
> GZIP | 00m43.416s | 00m10.033s
> BZIP | 10m42.296s | 01m02.525s
> XZ | 10m15.937s | 00m12.565s
>
>
> Raw data below
>
> root@log01 smooge-b]# du -s messages.log mail.log
> 644568 messages.log
> 610816 mail.log
> [root@log01 smooge-b]# time gzip -v -9 messages.log mail.log
> messages.log: 98.3% -- replaced with messages.log.gz
> mail.log: 89.3% -- replaced with mail.log.gz
>
> real 0m43.416s
> user 0m41.335s
> sys 0m1.736s
> [root@log01 smooge-b]# du -s messages.log.gz mail.log.gz
> 10992 messages.log.gz
> 65060 mail.log.gz
> [root@log01 smooge-b]# time gunzip -v messages.log.gz mail.log.gz
> messages.log.gz: 98.3% -- replaced with messages.log
> mail.log.gz: 89.3% -- replaced with mail.log
>
> real 0m10.033s
> user 0m6.948s
> sys 0m3.004s
>
> [root@log01 smooge-b]# time bzip2 -v -9 messages.log mail.log
> messages.log: 133.043:1, 0.060 bits/byte, 99.25% saved, 659381328
> in, 4956148 out.
> mail.log: 14.961:1, 0.535 bits/byte, 93.32% saved, 624854215
> in, 41766136 out.
>
> real 10m42.296s
> user 10m36.948s
> sys 0m1.608s
> [root@log01 smooge-b]# du -sc messages.log.bz2 mail.log.bz2
> 4856 messages.log.bz2
> 40836 mail.log.bz2
> 45692 total
> [root@log01 smooge-b]# time bunzip2 -v messages.log.bz2 mail.log.bz2
> messages.log.bz2: done
> mail.log.bz2: done
>
> real 1m2.525s
> user 0m44.779s
> sys 0m4.956s
>
> [root@log01 smooge-b]# time xz -v -9 messages.log mail.log
> messages.log (1/2)
> 100.0 % 5,923.6 KiB / 628.8 MiB = 0.009 3.1 MiB/s 3:21
>
> mail.log (2/2)
> 100.0 % 34.7 MiB / 595.9 MiB = 0.058 1.4 MiB/s 6:53
>
> real 10m15.937s
> user 10m8.550s
> sys 0m3.552s
> [root@log01 smooge-b]# du -s messages.log.xz mail.log.xz
> 5940 messages.log.xz
> 35536 mail.log.xz
> [root@log01 smooge-b]# time unxz -v messages.log.xz mail.log.xz
> messages.log.xz (1/2)
> 100.0 % 5,923.6 KiB / 628.8 MiB = 0.009 140 MiB/s 0:04
>
> mail.log.xz (2/2)
> 100.0 % 34.7 MiB / 595.9 MiB = 0.058 74 MiB/s 0:08
>
> real 0m12.565s
> user 0m8.709s
> sys 0m3.636s
>
>
It does take a while to grep through the bzipped logs. if you want to
re-compress them all i say have at it.
-Mike
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
08-27-2010, 03:58 AM
Stephen John Smoogen
Compressing files (gz versus bz2 versus xz)
On Thu, Aug 26, 2010 at 17:44, Mike McGrath <mmcgrath@redhat.com> wrote:
> On Thu, 26 Aug 2010, Stephen John Smoogen wrote:
>
>> So I was asked to look at compression on log servers and to see if
>> changing to xz would save us some space. My test is not comprehensive
>> but showed what might happen.
>>
>> Basic summary. XZ may save us up to 2% over what we are currently
>> saving but its real advantage is in speed of uncompressing files over
>> bzip2. [compression may be faster for some files also.]
>>
>
> It does take a while to grep through the bzipped logs. *if you want to
> re-compress them all i say have at it.
Ok I will look at it after I get the hardware call in tomorrow.
--
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure