[MERGE] Review feedback, making things more clear, adding documentation on what is used where.

Tue Jun 17 07:20:20 BST 2008

On 17 Jun 2008, Robert Collins <robertc at robertcollins.net> wrote:
> This improves the VersionedFiles patch by doing various identified fixes
> in reviews-to-date, adding some useful comments, and documenting more of
> the ideas behind and uses for the tuple namespace.

Thanks.

>  class KnitPackRepository(KnitRepository):
> -    """Repository with knit objects stored inside pack containers."""
> +    """Repository with knit objects stored inside pack containers.
> +    
> +    The layering for a KnitPackRepository is:
> +
> +    Graph        |  HPSS    | Repository public layer |
> +    ===================================================
> +    Tuple based apis below, string based, and key based apis above
> +    ---------------------------------------------------
> +    KnitVersionedFiles
> +      Provides .texts, .revisions etc
> +      This adapts the N-tuple keys to physical knit records which only have a
> +      single string identifier (for historical reasons), which in older formats
> +      was always the revision_id, and in the mapped code for packs is always
> +      the last element of key tuples.
> +    ---------------------------------------------------
> +    GraphIndex
> +      A separate GraphIndex is used for each of the
> +      texts/inventories/revisions/signatures contained within each individual
> +      pack file. The GraphIndex layer works in N-tuples and is unaware of any
> +      semantic value.
> +    ===================================================
> +    
> +    """

I can see how this came from the block diagram you sketched out but I
don't think the ascii art sufficiently conveys it.  

It seems like you are trying to convey:

 * broadly what the interfaces to this code are
 * what naming conventions each of them uses -- I suggest for each one
   just say what it uses e.g.. "always uses tuple keys of (file_id,
   revision_id)"
 * which ones are intended for general use and which are private to the
   Repository implementation

I don't think the 'line' with things above and below is clear.

>  
>      def __init__(self, _format, a_bzrdir, control_files, _commit_builder_class,
>          _serializer):
> 
> === modified file 'bzrlib/repository.py'
> --- bzrlib/repository.py	2008-06-11 07:22:00 +0000
> +++ bzrlib/repository.py	2008-06-17 05:02:34 +0000
> @@ -446,16 +446,41 @@
>      revisions and file history.  It's normally accessed only by the Branch,
>      which views a particular line of development through that history.
>  
> -    The Repository builds on top of Stores and a Transport, which respectively 
> -    describe the disk data format and the way of accessing the (possibly 
> +    The Repository builds on top of some byte storage facilies (the revisions,
> +    signatures, inventories and texts attributes) and a Transport, which
> +    respectively provide byte storage and a means to access the (possibly
>      remote) disk.

So ultimately it is all based on the Transport.  But I think what you're
saying is that each of those VersionedFiles accesses the transport, and
the Repository does itself too?  Also why not say VersionedFiles rather
than 'byte storage facilities'?

>  
> +    The byte storage facilities are addressed via tuples, which we refer to
> +    as 'keys' throughout the code base. Revision_keys, inventory_keys and
> +    signature_keys are all 1-tuples: (revision_id,). text_keys are two-tuples:
> +    (file_id, revision_id). We use this interface because it allows low
> +    friction with the underlying code that implements disk indices, network
> +    encoding and other parts of bzrlib.
> +
>      :ivar revisions: A bzrlib.versionedfile.VersionedFiles instance containing
>          the serialised revisions for the repository. This can be used to obtain
>          revision graph information or to access raw serialised revisions.
>          The result of trying to insert data into the repository via this store
>          is undefined: it should be considered read-only except for implementors
>          of repositories.
> +    :ivar signatures: A bzrlib.versionedfile.VersionedFiles instance containing
> +        the serialised signatures for the repository. This can be used to
> +        obtain access to raw serialised signatures.  The result of trying to
> +        insert data into the repository via this store is undefined: it should
> +        be considered read-only except for implementors of repositories.
> +    :ivar inventories: A bzrlib.versionedfile.VersionedFiles instance containing
> +        the serialised inventories for the repository. This can be used to
> +        obtain unserialised inventories.  The result of trying to insert data
> +        into the repository via this store is undefined: it should be
> +        considered read-only except for implementors of repositories.
> +    :ivar texts: A bzrlib.versionedfile.VersionedFiles instance containing the
> +        texts of files and directories for the repository. This can be used to
> +        obtain file texts or file graphs. Note that Repository.iter_file_bytes
> +        is usually a better interface for accessing file texts.
> +        The result of trying to insert data into the repository via this store
> +        is undefined: it should be considered read-only except for implementors
> +        of repositories.
>      :ivar _transport: Transport for file access to repository, typically
>          pointing to .bzr/repository.
>      """

OK this is pretty good, except there is some (accidental?) inconsistency
about whether they give you serialised or unserialised objects.  Also I
think the copy and paste should be removed, and add something like this
at the end:

  All four VersionedFiles contain the contain the serialized forms of
  the relevant data, and this format is specific to the particular
  repository format.  To get a usable object, call e.g.
  get_inventory().

You say they should be read-only except for repository implementation
code, but should anyone else even be reading from them?  What kind of
access is allowed?

> 
> === modified file 'bzrlib/versionedfile.py'
> --- bzrlib/versionedfile.py	2008-06-12 03:25:25 +0000
> +++ bzrlib/versionedfile.py	2008-06-17 05:02:34 +0000
> @@ -678,6 +678,15 @@
>  
>      Currently no implementation allows the graph of different key prefixes to
>      intersect, but the API does allow such implementations in the future.
> +
> +    The keyspace is expressed via simple tuples. Any instance of VersionedFiles
> +    may have a different length key-size, but that size will be constant for
> +    all texts added to or retrieved from it. For instance, bzrlib uses
> +    instances with a key-size of 2 for storing user files in a repository, with
> +    the first element the fileid, and the second the version of that file.

add

  For signatures, inventories and revision texts there is just a
  1-tuple, (revision_id,)

I realize this is not defined by this level but it's a reasonable place
to put it.

> +
> +    The use of tuples allows a single code base to support several different
> +    uses with only the mapping logic changing from instance to instance.
>      """
>  
>      def add_lines(self, key, parents, lines, parent_texts=None,
> 

-- 
Martin