[merge] document the pack format

Wed Oct 24 10:05:55 BST 2007

Martin Pool пишет:
> One of the review suggestions for the pack branch is that there should
> be some overall developer documentation for how it works, to help
> people who need to maintain or extend it later.
> 
> This doesn't cover every detail but does duplicate some things covered
> in docstrings.  That's ok; the point is to tell people where to start
> looking.
> 
> I did remember to link this in to the developer/index.txt but that's
> in a separate change.

Please, provide either the bundle or diff with relative filename.

> The new name in the heading is because it was suggested in review (and
> accepted by Robert iirc) that GraphKnit was not a good name.
> 
> I'm interested to hear from both people who are familiar with the
> packs code, and those who are not:
> 
>  * is anything incorrect
>  * is there anything not explained but should be
>  * does it give a reasonable overview?

I'm not familiar with packs code, so my opinion is from `tabula raca` 
point of view. Please, bear with me :-)

The table with indices is not very clear, probably it require some deep 
understanding of current Knit implementation that I don't have.
Is I understand correctly that:

.tix -> is simmilar to all-in-one file-id.kndx? Is not it will be too big?
.six -> equivalent of signatures.kndx?
.rix -> equivalent of revisions.kndx?
.iix -> equivalent of inventory.kndx?

And some nitpicks:

+==== ========== ======================== ==========================
+extn Purpose    Key                      References
+==== ========== ======================== ==========================
+.tix File texts ``file_id, revision_id`` compression base,
+                                         per-file parents

+.six Signatures ``revision_id,``         -
                               ^-- IMO comma is unnecessary?

+.rix Revisions  ``revision_id,``         revision parents
                               ^-- here too?

+.iix Inventory  ``revision_id,``         compression base,
                               ^-- here too?

+                                         revision parents
+==== ========== ======================== ==========================

Next:

+There can also be index entries with a value of 'a' for absent.  These
+records exist just to be pointed to in a graph.  This is used, for
+example, to give the revision-parent pointer when the parent revision is
+in a previous pack.

^-- concept of 'previous pack' is not explained. If packs filenames 
created based on md5 hash, how you can say who is previous and who is next?

Next:

+It is not possible to regenerate an index from the body file

^-- is there planning to change this limitation in the future?

Next: I'm not completely understand this sentence:

+Read locks control caching but do not affect writers.

What 'caching' do you mean? Sorry, if this is a stupid question.

Next:

+As well as the list of names, it also contains the size in bytes of th`d

^-- incomplete sentence? What is 'th`d'?

And at the end I want to ask question:

What if older packs will be absent in the repo directory?
How it affects bzr works? Can we start history horizon
with current packs format or there should be additional
work? Can packs support 'multiple repo' strategy?
I.e. I talk about situation when some developer has
very old history on CD and fresh history on their HDD.
IIRC, this example was given by Linus in last year(?) flame
between bzr and git.