summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'glep-0044.rst')
-rw-r--r--glep-0044.rst327
1 files changed, 327 insertions, 0 deletions
diff --git a/glep-0044.rst b/glep-0044.rst
new file mode 100644
index 0000000..17c4867
--- /dev/null
+++ b/glep-0044.rst
@@ -0,0 +1,327 @@
+GLEP: 44
+Title: Manifest2 format
+Version: $Revision$
+Last-Modified: $Date$
+Author: Marius Mauch <genone@gentoo.org>,
+Status: Final
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 04-Dec-2005
+Post-History: 06-Dec-2005, 23-Jan-2006, 3-Sep-2006
+
+
+Abstract
+========
+
+This GLEP proposes a new format for the Portage Manifest and digest file system
+by unifying both filetypes into one to improve functional and non-functional
+aspects of the Portage Tree.
+
+
+Motivation
+==========
+
+Please see [#reorg-thread]_ for a general overview.
+The main long term goals of this proposal are to:
+
+- Remove the tiny digest files from the tree. They are a major annoyance as on a
+ typical configuration they waste a lot of disk space and the simple transmission
+ of the names for all digest files during a ``emerge --sync`` needs a substantial
+ amount of bandwidth.
+- Reduce redundancy when multiple hash functions are used
+- Remove potential for checksum collisions if a file is recorded in more than one
+ digest file
+- Difference between filetypes for a more flexible verification system
+
+
+Specification
+=============
+
+The new Manifest format would change the existing format in the following ways:
+
+- Addition of a filetype specifier, currently planned are
+
+ * ``AUX`` for files directly used by ebuilds (e.g. patches or initscripts),
+ located in the ``files/`` subdirectory
+
+ * ``EBUILD`` for all ebuilds
+
+ * ``MISC`` for files not directly used by ebuilds like ``ChangeLog`` or
+ ``metadata.xml`` files
+
+ * ``DIST`` for release tarballs recorded in the ``SRC_URI`` variable of an ebuild,
+ these were previously recorded in the digest files
+
+ Future portage improvements might extend this list (for example with types
+ relevant for eclasses or profiles)
+
+- Only have one line per file listing all information instead of one line per
+ file and checksum type
+
+- Remove the separated digest-* files in the ``files/`` subdirectory
+
+Each line in the new format has the following format:
+
+::
+
+ <filetype> <filename> <filesize> <chksumtype1> <chksum1> ... <chksumtypen> <chksumn>
+
+
+However theses entries will be stored in the existing Manifest files.
+
+An `actual example`__ for a (pure) Manifest2 file..
+
+.. __: glep-0044-extras/manifest2-example.txt
+
+
+Compatibility Entries
+---------------------
+
+To maintain compatibility with existing portage versions a transition period after
+is the introduction of the Manifest2 format is required during which portage
+will not only have to be capable of using existing Manifest and digest files but
+also generate them in addition to the new entries.
+Fortunately this can be accomplished by simply mixing old and new style entries
+in one file for the Manifest files, existing portage versions will simply ignore
+the new style entries. For the digest files there are no new entries to care
+about.
+
+Scope
+-----
+
+It is important to note that this proposal only deals with a change of the
+format of the digest and Manifest system.
+
+It does not expand the scope of it to cover eclasses, profiles or anything
+else not already covered by the Manifest system, it also doesn't affect
+the Manifest signing efforts in any way (though the implementations of both
+might be coupled).
+
+Also while multiple hash functions will become standard with the proposed
+implementation they are not a specific feature of this format [#multi-hash-thread]_.
+
+Number of hashes
+----------------
+
+While using multiple hashes for each file is a major feature of this proposal
+we have to make sure that the number of hashes listed is limited to avoid
+an explosion of the Manifest size that would revert the main benefit of this proposal
+(reducing tree size). Therefore the number of hashes that will be generated
+will be limited to three different hash functions. For compatibility though we
+have to rely on at least one hash function to always be present, this proposal
+suggest to use SHA1 for this purpose (as it is supposed to be more secure than MD5
+and currently only SHA1 and MD5 are directly available in python, also MD5 doesn't
+have any benefit in terms of compatibility).
+
+Rationale
+=========
+
+The main goals of the proposal have been listed in the `Motivation`_, here now
+the explanation why they are improvements and how the proposed format will
+accomplish them.
+
+Removal of digest files
+-----------------------
+
+Normal users that don't use a "tuned" filesystem for the portage tree are
+wasting several dozen to a few hundred megabytes of disk space with the current
+system, largely caused by the digest files.
+This is due to the filesystem overhead present in most filesystems that
+have a standard blocksize of four kilobytes while most digest files are under
+one kilobyte in size, so this results in approximately a waste of three kilobytes
+per digest file (likely even more). At the time of this writing the tree contains
+roughly 22.000 digest files, so the overall waste caused by digest files is
+estimated at about 70-100 megabytes.
+Furthermore it is assumed that this will also reduce the disk space wasted by
+the Manifest files as they now contain more content, but this hasn't been
+verified yet.
+
+By unifying the digest files with the Manifest these tiny files are eliminated
+(in the long run), reducing the apparent tree size by about 20%, benefiting
+both users and the Gentoo infrastructure.
+
+Reducing redundancy
+-------------------
+
+When multiple hashes are used with the current system
+both the filename and filesize are repeated for every checksum type used as each
+checksum is standalone. However this doesn't add any functionality and is
+therefore useless, so the new format removes this redundancy.
+This is a theoretical improvement at this moment as only one hash function is in
+use, but expected to change soon (see [#multi-hash-thread]_).
+
+Removal of checksum collisions
+------------------------------
+
+The current system theoretically allows for a ``DIST`` type file to be recorded
+in multiple digest files with different sizes and/or checksums. In such a case
+one version of a package would report a checksum violation while another one
+would not. This could create confusion and uncertainty among users.
+So far this case hasn't been observed, but it can't be ruled out with the
+existing system.
+As the new format lists each file exactly once this would be no longer possible.
+
+Flexible verification system
+----------------------------
+
+Right now portage verifies the checksum of every file listed in the Manifest
+before using any file of the package and all ``DIST`` files of an ebuild
+before using that ebuild. This is unnecessary in many cases:
+
+- During the "depend" phase (when the ebuild metadata is generated) only
+ files of type ``EBUILD`` are used, so verifying the other types isn't
+ necessary. Theoretically it is possible for an ebuild to include other
+ files like those of type ``AUX`` at this phase, but that would be a
+ major QA violation and should never occur, so it can be ignored here.
+ It is also not a security concern as the ebuild is verified before parsing
+ it, so each manipulation would show up.
+
+- Generally files of type ``MISC`` don't need to be verified as they are
+ only used in very specific situations, aren't executed (just parsed at most)
+ and don't affect the package build process.
+
+- Files of type ``DIST`` only need to be verified directly after fetching and
+ before unpacking them (which often will be one step), not every time their
+ associated ebuild is used.
+
+
+Backwards Compatibility
+=======================
+
+Switching the Manifest system is a task that will need a long transition period
+like most changes affecting both portage and the tree. In this case the
+implementation will be rolled out in several phases:
+
+1. Add support for verification of Manifest2 entries in portage
+
+2. Enable generation of Manifest2 entries in addition to the current system
+
+3. Ignore digests during ``emerge --sync`` to get the size-benefit clientside.
+ This step may be omitted if the following steps are expected to follow soon.
+
+4. Disable generation of entries for the current system
+
+5. Remove all traces of the current system from the tree (serverside)
+
+Each step has its own issues. While 1) and 2) can be implemented without any
+compatibility problems all later steps have a major impact:
+
+- Step 3) can only be implemented when the whole tree is Manifest2 ready
+ (ideally speaking, practically the requirement will be more like 95% coverage
+ with the expectation that for the remaining 5% either bugs will be filed after
+ step 3) is completed or they'll be updated at step 5).
+
+- Steps 4) and 5) will render all portage versions without Manifest2 support
+ basically useless (users would have to regenerate the digest and Manifest
+ for each package before being able to merge it), so this requires a almost
+ 100% coverage of the userbase with Manifest2 capable portage versions
+ (with step 1) completely implemented).
+
+Another problem is that some steps affect different targets:
+
+- Steps 1) and 3) target portage versions used by users
+
+- Steps 2) and 4) target portage versions used by devs
+
+- Step 5) targets the portage tree on the cvs server
+
+While it is relatively easy to get all devs to use a new portage version this is
+practically impossible with users as some don't update their systems regularly.
+While six months are probably sufficient to reach a 95% coverage one year is
+estimated to reach an almost-complete coverage. All times are relative to the
+stable-marking of a compatible portage version.
+
+No timeframe for implementation is presented here as it is highly dependent
+on the completion of each step.
+
+In summary it can be said that while a full conversion will take over a year
+to be completed due to compatibility issues mentioned above some benefits of the
+system can selectively be used as soon as step 2) is completed.
+
+
+Other problems
+==============
+
+Impacts on infrastructure
+-------------------------
+
+While one long term goal of this proposal is to reduce the size of the tree
+and therefore make life for the Gentoo Infrastructure easier this will only
+take effect once the implementation is rolled out completely. In the meantime
+however it will increase the tree size due to keeping checksums in both formats.
+It's not possible to give a usable estimate on the degree of the increase as
+it depends on many variables such as the exact implementation timeframe,
+propagation of Manifest2 capable portage versions among devs or the update
+rate of the tree. It has been suggested that Manifest files that are not gpg
+signed could be mass converted in one step, this could certainly help but only
+to some degree (according to a recent research [#gpg-numbers]_ about 40% of
+all Manifests in the tree are signed, but this number hasn't been verified).
+
+
+Reference Implementation
+========================
+
+A patch for a prototype implementation of Manifest2 verification and partial
+generation has been posted at [#manifest2-patch]_, it will be reworked before
+being considered for inclusion in portage. However it shows that adding support
+for verification is quite simple, but generation is a bit tricky and will
+therefore be implemented later.
+
+
+Options
+=======
+
+Some things have been considered for this GLEP but aren't part of the proposal
+yet for various reasons:
+
+- timestamp field: the author has considered adding a timestamp field for
+ each entry to list the time the entry was created. However so far no practical
+ use for such a feature has been found.
+
+- convert size field into checksum: Another idea was to treat the size field
+ like any other checksum. But so far no real benefit (other than a slightly
+ more modular implementation) for this has been seen while it has several
+ drawbacks: For once, unlike checksums, the size field is definitely required
+ for all ``DIST`` files, also it would slightly increase the length of
+ each entry by adding a ``SIZE`` keyword.
+
+- removal of the ``MISC`` type: It has been suggested to completely drop
+ entries of type ``MISC``. This would result in a minor space reduction
+ (its rather unlikely to free any blocks) but completely remove the ability
+ to check these files for integrity. While they don't influence portage
+ or packages directly they can contain viable information for users, so
+ the author has the opinion that at least the option for integrity checks
+ should be kept.
+
+Credits
+=======
+
+Thanks to the following persons for their input on or related to this GLEP
+(even though they might not have known it):
+Ned Ludd (solar), Brian Harring (ferringb), Jason Stubbs (jstubbs),
+Robin H. Johnson (robbat2), Aron Griffis (agriffis)
+
+Also thanks to Nicholas Jones (carpaski) to make the current Manifest system
+resistent enough to be able to handle this change without too many transition
+problems.
+
+References
+==========
+
+.. [#reorg-thread] http://thread.gmane.org/gmane.linux.gentoo.devel/21920
+
+.. [#multi-hash-thread] http://thread.gmane.org/gmane.linux.gentoo.devel/33434
+
+.. [#gpg-numbers] gentoo-core mailing list, topic "Gentoo key signing practices
+ and official Gentoo keyring", Message-ID <20051117075838.GB15734@curie-int.vc.shawcable.net>
+
+.. [#manifest2-patch] http://thread.gmane.org/gmane.linux.gentoo.portage.devel/1374
+
+.. [#manifest2-example] http://www.gentoo.org/proj/en/glep/glep-0044-extras/manifest2-example
+
+Copyright
+=========
+
+This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
+Unported License. To view a copy of this license, visit
+http://creativecommons.org/licenses/by-sa/3.0/.