Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Gentoo Development (http://www.linux-archive.org/gentoo-development/)
-   -   GLEP61 - Manifest2 compression (http://www.linux-archive.org/gentoo-development/317861-glep61-manifest2-compression.html)

"Robin H. Johnson" 01-31-2010 09:04 AM

GLEP61 - Manifest2 compression
 
Changes:
- This GLEP can stand independently of GLEP58.
- Add XZ to compression types list.
- Move cutoff to 32KiB. Provide size example w/ 32KiB+gzip.
- Split specification into generation and validation.

--
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
GLEP: 61
Title: Manifest2 compression
Version: $Revision: 1.6 $
Last-Modified: $Date: 2010/01/31 09:55:43 $
Author: Robin Hugh Johnson <robbat2@gentoo.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Requires: 44
Created: July 2008
Updated: October 2008, January 2010
Updates: 44
Post-History: December 2009, January 2010

Abstract
========
Deals with compression of large Manifest2 files.

Motivation
==========
With the introduction of MetaManifest, and full-tree Manifest coverage,
we are faced with the possibility of having very large Manifests.

Preliminary experiments with MetaManifest, show that with just the
existing per-package Manifests, the full MetaManifest (top-level only,
no first-level sub directories), for a tree including metadata/, exceeds
8MiB in size. Applying common compression can achieve a 50-60% reduction
in this size.

Additionally, some of the larger already-existing Manifests in the tree
can also be reduced.

This GLEP is not mandatory for the tree-signing specification, but
instead helps to cut down the size impact of large Manifest2 files, some
of which are already present in the tree. As such, it is also able to
stand on it's own.

Specification
=============
Creation of compressed Manifests:
---------------------------------
32KiB is suggested as a arbitrary cut-off point to start generating
compressed Manifest2 files.

The compression must only applied during the creation of a tree intended
for end users. No Manifests stored in a VCS should be compressed in the
VCS. For the main gentoo-portage tree, this means that the compressed
Manifests should be generated using the CVS to Rsync process.

The Manifest compression process is required to ensure that inconsistent
compressed versions do not exist.

Validation of Manifests:
------------------------
When searching for a Manifest2 file, if the basename form does not
exist, the package manager should search in the same location using
common compressed suffixes, and use the compressed file in place of the
Manifest2.

gzip, bzip2, lzma, xz should all be supported if available on the given
platform. In the case that multiple versions exist, the package manager
should simply pick one - they should be identical, differing only in
compression.

Example Results with a 32KiB cut-off, gzip algorithm
================================================== ==
As of 2010/01/30, the suggested cut-off would impact the following 21 existing
Manifests, for a saving of nearly 900KiB::

Size Path
65788 app-doc/linux-gazette/Manifest
75739 app-office/openoffice-bin/Manifest
40534 app-text/texlive-core/Manifest
41710 dev-texlive/texlive-bibtexextra/Manifest
38197 dev-texlive/texlive-documentation-english/Manifest
129610 dev-texlive/texlive-fontsextra/Manifest
36022 dev-texlive/texlive-humanities/Manifest
686118 dev-texlive/texlive-latexextra/Manifest
43392 dev-texlive/texlive-latexrecommended/Manifest
33375 dev-texlive/texlive-mathextra/Manifest
39781 dev-texlive/texlive-pictures/Manifest
69567 dev-texlive/texlive-pstricks/Manifest
75460 dev-texlive/texlive-publishers/Manifest
50879 dev-texlive/texlive-science/Manifest
36711 kde-base/kde-l10n/Manifest
36539 media-gfx/bootsplash-themes/Manifest
33058 net-fs/autofs/Manifest
39781 www-client/firefox-bin/Manifest
48983 www-client/icecat/Manifest
60213 www-client/mozilla-firefox/Manifest
39065 x11-themes/gkrellm-themes/Manifest


Additionally, with the MetaManifest proposal, the following new manifests would
also be compressed, for a saving of nearly 4MiB::

Size Path
33442 app-admin/Manifest
71073 app-dicts/Manifest
35923 app-emacs/Manifest
45808 app-misc/Manifest
50169 app-text/Manifest
112786 dev-java/Manifest
65581 dev-libs/Manifest
42619 dev-lisp/Manifest
182163 dev-perl/Manifest
96198 dev-python/Manifest
58963 dev-ruby/Manifest
59736 dev-util/Manifest
58338 eclass/Manifest
55749 kde-base/Manifest
110064 licenses/Manifest
35262 media-gfx/Manifest
53995 media-libs/Manifest
55607 media-plugins/Manifest
71911 media-sound/Manifest
34835 media-video/Manifest
5747849 metadata/Manifest
47452 net-analyzer/Manifest
65989 net-misc/Manifest
316787 profiles/Manifest
67784 sys-apps/Manifest
48971 x11-misc/Manifest
41475 x11-plugins/Manifest


Backwards Compatibility
=======================
The package Manifests should also be maintained as ONLY uncompressed in
CVS.

For processing of all existing per-package Manifests, if compression is
used, it should be done in parallel to the existing Manifests, to
provide for a changeover period. Newer versions of Portage may later
choose to exclude all non-compressed Manifests during emerge --sync if
compressed versions are guaranteed to exist on the servers.

MetaManifests may come into existence as compressed from the start, as
do not have an backwards compatibility issues.

As a side note, this breaks all manual interaction with Manifests
such as grep, and so should only be applied to large Manifest2 files,
such as the MetaManifest.

References
==========
.. [#GLEP44] Mauch, M. (2005) GLEP44 - Manifest2 format.
http://www.gentoo.org/proj/en/glep/glep-0044.html

Copyright
=========
Copyright (c) 2008-2010 by Robin Hugh Johnson. This material may be
distributed only subject to the terms and conditions set forth in the
Open Publication License, v1.0.

vim: tw=72 ts=2 expandtab:

Brian Harring 02-08-2010 12:02 AM

GLEP61 - Manifest2 compression
 
On Sun, Jan 31, 2010 at 10:04:40AM +0000, Robin H. Johnson wrote:
> Changes:
> - This GLEP can stand independently of GLEP58.
> - Add XZ to compression types list.
> - Move cutoff to 32KiB. Provide size example w/ 32KiB+gzip.
> - Split specification into generation and validation.
>

One concern w/ this glep- the intention seems to be to reduce on disk
space requirements but the addition of compression raises questions
for rsync transferance of the manifests.

Have you done any testing to quantify how much of an increase in rsync
bandwidth this will add? Specifically thinking about the metamanifest
on this one.

~Brian

"Robin H. Johnson" 02-08-2010 04:23 AM

GLEP61 - Manifest2 compression
 
On Sun, Feb 07, 2010 at 05:02:22PM -0800, Brian Harring wrote:
> On Sun, Jan 31, 2010 at 10:04:40AM +0000, Robin H. Johnson wrote:
> > Changes:
> > - This GLEP can stand independently of GLEP58.
> > - Add XZ to compression types list.
> > - Move cutoff to 32KiB. Provide size example w/ 32KiB+gzip.
> > - Split specification into generation and validation.
> One concern w/ this glep- the intention seems to be to reduce on disk
> space requirements but the addition of compression raises questions
> for rsync transferance of the manifests.
>
> Have you done any testing to quantify how much of an increase in rsync
> bandwidth this will add? Specifically thinking about the metamanifest
> on this one.
The top-level MetaManifest, in the case of fully split (eg a Manifest in
every 1st-level directory $CAT/Manifest and the other dirs), is only
33KiB.

21 existing packages have Manifests larger than 32KiB, texlive stuff
come in the worst here.

I do agree, that depending on the block alignment, there is an increase
in transfer size in some cases, but I have not conducted rigorous tests
to work out long-term statistics on changes.

The more aggressively that Manifests are added to each subdirectory
tree, the less Manifest2 compression is actually required, as the
individual Manifests are more likely to fit within the size limit.

With the 1st-level case again, here's a size count breakdown for the new
(Meta)Manifests:
>=32KiB: 27
>=64KiB: 10
>=128KiB: 3 (dev-perl @ 179KiB, metadata/ @ 5.2MiB, profiles/ @ 300KiB)

I think the best course of action is to end up generating the compressed
MetaManifests when we start generated the MetaManifests themselves, but
not placing them into the tree yet. Instead simply use them to measure
rsync transfer size impact on the generation server and produce
statistics to see if the cutoff could benefit from being altered, or if
the disk space should be wasted in favour of smaller transfer size.

--
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85

Brian Harring 02-08-2010 04:50 AM

GLEP61 - Manifest2 compression
 
On Mon, Feb 08, 2010 at 05:23:03AM +0000, Robin H. Johnson wrote:
> On Sun, Feb 07, 2010 at 05:02:22PM -0800, Brian Harring wrote:
> > On Sun, Jan 31, 2010 at 10:04:40AM +0000, Robin H. Johnson wrote:
> > > Changes:
> > > - This GLEP can stand independently of GLEP58.
> > > - Add XZ to compression types list.
> > > - Move cutoff to 32KiB. Provide size example w/ 32KiB+gzip.
> > > - Split specification into generation and validation.
> > One concern w/ this glep- the intention seems to be to reduce on disk
> > space requirements but the addition of compression raises questions
> > for rsync transferance of the manifests.
> >
> > Have you done any testing to quantify how much of an increase in rsync
> > bandwidth this will add? Specifically thinking about the metamanifest
> > on this one.
>
> I think the best course of action is to end up generating the compressed
> MetaManifests when we start generated the MetaManifests themselves, but
> not placing them into the tree yet. Instead simply use them to measure
> rsync transfer size impact on the generation server and produce
> statistics to see if the cutoff could benefit from being altered, or if
> the disk space should be wasted in favour of smaller transfer size.

Works for me- I do strongly suspect that if we use compression we'll
be purely trading disk space decrease for the cost of transfering each
changed compressed manifest in full though... rsync + compression do
not get along at all.

Either way, stats would be useful when you've got time.
~brian


All times are GMT. The time now is 01:25 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.