[MERGE] Review feedback, making things more clear, adding documentation on what is used where.
Martin Pool
mbp at canonical.com
Tue Jun 17 07:20:20 BST 2008
On 17 Jun 2008, Robert Collins <robertc at robertcollins.net> wrote:
> This improves the VersionedFiles patch by doing various identified fixes
> in reviews-to-date, adding some useful comments, and documenting more of
> the ideas behind and uses for the tuple namespace.
Thanks.
> class KnitPackRepository(KnitRepository):
> - """Repository with knit objects stored inside pack containers."""
> + """Repository with knit objects stored inside pack containers.
> +
> + The layering for a KnitPackRepository is:
> +
> + Graph | HPSS | Repository public layer |
> + ===================================================
> + Tuple based apis below, string based, and key based apis above
> + ---------------------------------------------------
> + KnitVersionedFiles
> + Provides .texts, .revisions etc
> + This adapts the N-tuple keys to physical knit records which only have a
> + single string identifier (for historical reasons), which in older formats
> + was always the revision_id, and in the mapped code for packs is always
> + the last element of key tuples.
> + ---------------------------------------------------
> + GraphIndex
> + A separate GraphIndex is used for each of the
> + texts/inventories/revisions/signatures contained within each individual
> + pack file. The GraphIndex layer works in N-tuples and is unaware of any
> + semantic value.
> + ===================================================
> +
> + """
I can see how this came from the block diagram you sketched out but I
don't think the ascii art sufficiently conveys it.
It seems like you are trying to convey:
* broadly what the interfaces to this code are
* what naming conventions each of them uses -- I suggest for each one
just say what it uses e.g.. "always uses tuple keys of (file_id,
revision_id)"
* which ones are intended for general use and which are private to the
Repository implementation
I don't think the 'line' with things above and below is clear.
>
> def __init__(self, _format, a_bzrdir, control_files, _commit_builder_class,
> _serializer):
>
> === modified file 'bzrlib/repository.py'
> --- bzrlib/repository.py 2008-06-11 07:22:00 +0000
> +++ bzrlib/repository.py 2008-06-17 05:02:34 +0000
> @@ -446,16 +446,41 @@
> revisions and file history. It's normally accessed only by the Branch,
> which views a particular line of development through that history.
>
> - The Repository builds on top of Stores and a Transport, which respectively
> - describe the disk data format and the way of accessing the (possibly
> + The Repository builds on top of some byte storage facilies (the revisions,
> + signatures, inventories and texts attributes) and a Transport, which
> + respectively provide byte storage and a means to access the (possibly
> remote) disk.
So ultimately it is all based on the Transport. But I think what you're
saying is that each of those VersionedFiles accesses the transport, and
the Repository does itself too? Also why not say VersionedFiles rather
than 'byte storage facilities'?
>
> + The byte storage facilities are addressed via tuples, which we refer to
> + as 'keys' throughout the code base. Revision_keys, inventory_keys and
> + signature_keys are all 1-tuples: (revision_id,). text_keys are two-tuples:
> + (file_id, revision_id). We use this interface because it allows low
> + friction with the underlying code that implements disk indices, network
> + encoding and other parts of bzrlib.
> +
> :ivar revisions: A bzrlib.versionedfile.VersionedFiles instance containing
> the serialised revisions for the repository. This can be used to obtain
> revision graph information or to access raw serialised revisions.
> The result of trying to insert data into the repository via this store
> is undefined: it should be considered read-only except for implementors
> of repositories.
> + :ivar signatures: A bzrlib.versionedfile.VersionedFiles instance containing
> + the serialised signatures for the repository. This can be used to
> + obtain access to raw serialised signatures. The result of trying to
> + insert data into the repository via this store is undefined: it should
> + be considered read-only except for implementors of repositories.
> + :ivar inventories: A bzrlib.versionedfile.VersionedFiles instance containing
> + the serialised inventories for the repository. This can be used to
> + obtain unserialised inventories. The result of trying to insert data
> + into the repository via this store is undefined: it should be
> + considered read-only except for implementors of repositories.
> + :ivar texts: A bzrlib.versionedfile.VersionedFiles instance containing the
> + texts of files and directories for the repository. This can be used to
> + obtain file texts or file graphs. Note that Repository.iter_file_bytes
> + is usually a better interface for accessing file texts.
> + The result of trying to insert data into the repository via this store
> + is undefined: it should be considered read-only except for implementors
> + of repositories.
> :ivar _transport: Transport for file access to repository, typically
> pointing to .bzr/repository.
> """
OK this is pretty good, except there is some (accidental?) inconsistency
about whether they give you serialised or unserialised objects. Also I
think the copy and paste should be removed, and add something like this
at the end:
All four VersionedFiles contain the contain the serialized forms of
the relevant data, and this format is specific to the particular
repository format. To get a usable object, call e.g.
get_inventory().
You say they should be read-only except for repository implementation
code, but should anyone else even be reading from them? What kind of
access is allowed?
>
> === modified file 'bzrlib/versionedfile.py'
> --- bzrlib/versionedfile.py 2008-06-12 03:25:25 +0000
> +++ bzrlib/versionedfile.py 2008-06-17 05:02:34 +0000
> @@ -678,6 +678,15 @@
>
> Currently no implementation allows the graph of different key prefixes to
> intersect, but the API does allow such implementations in the future.
> +
> + The keyspace is expressed via simple tuples. Any instance of VersionedFiles
> + may have a different length key-size, but that size will be constant for
> + all texts added to or retrieved from it. For instance, bzrlib uses
> + instances with a key-size of 2 for storing user files in a repository, with
> + the first element the fileid, and the second the version of that file.
add
For signatures, inventories and revision texts there is just a
1-tuple, (revision_id,)
I realize this is not defined by this level but it's a reasonable place
to put it.
> +
> + The use of tuples allows a single code base to support several different
> + uses with only the mapping logic changing from instance to instance.
> """
>
> def add_lines(self, key, parents, lines, parent_texts=None,
>
--
Martin
More information about the bazaar
mailing list