[merge] document the pack format

Robert Collins robertc at robertcollins.net
Wed Oct 24 21:39:18 BST 2007


On Wed, 2007-10-24 at 12:05 +0300, Alexander Belchenko wrote:

> > The new name in the heading is because it was suggested in review (and
> > accepted by Robert iirc) that GraphKnit was not a good name.
> > 
> > I'm interested to hear from both people who are familiar with the
> > packs code, and those who are not:
> > 
> >  * is anything incorrect
> >  * is there anything not explained but should be
> >  * does it give a reasonable overview?
> 
> I'm not familiar with packs code, so my opinion is from `tabula raca` 
> point of view. Please, bear with me :-)
> 
> The table with indices is not very clear, probably it require some deep 
> understanding of current Knit implementation that I don't have.
> Is I understand correctly that:
> 
> .tix -> is simmilar to all-in-one file-id.kndx? Is not it will be too big?

Yes, its all the file-id.kndx files combined into one big one per-pack
file.

> .six -> equivalent of signatures.kndx?
> .rix -> equivalent of revisions.kndx?
> .iix -> equivalent of inventory.kndx?

Yes.

> And some nitpicks:
> 
> +==== ========== ======================== ==========================
> +extn Purpose    Key                      References
> +==== ========== ======================== ==========================
> +.tix File texts ``file_id, revision_id`` compression base,
> +                                         per-file parents
> 
> +.six Signatures ``revision_id,``         -
>                                ^-- IMO comma is unnecessary?
> 
> +.rix Revisions  ``revision_id,``         revision parents
>                                ^-- here too?
> 
> +.iix Inventory  ``revision_id,``         compression base,
>                                ^-- here too?
> 
> +                                         revision parents
> +==== ========== ======================== ==========================

The keys are tuples (revision_id,) in the code, and I think that bled
across; for prose I would either show it as a tuple, or just
'``revision_id``'.

> Next:
> 
> +There can also be index entries with a value of 'a' for absent.  These
> +records exist just to be pointed to in a graph.  This is used, for
> +example, to give the revision-parent pointer when the parent revision is
> +in a previous pack.
> 
> ^-- concept of 'previous pack' is not explained. If packs filenames 
> created based on md5 hash, how you can say who is previous and who is next?

Rather than 'previous' how about 'different'. You are right that there
is no ordering required or enforced between different pack files.

> Next:
> 
> +It is not possible to regenerate an index from the body file
> 
> ^-- is there planning to change this limitation in the future?

Yes, and I thought Martin had some text about that in the patch.

> Next: I'm not completely understand this sentence:
> 
> +Read locks control caching but do not affect writers.
> 
> What 'caching' do you mean? Sorry, if this is a stupid question.

Some data is cached in memory during a read lock or write lock. E.g.
when we parse an index, we hold it in memory until the object is
unlocked().

> Next:
> 
> +As well as the list of names, it also contains the size in bytes of th`d
> 
> ^-- incomplete sentence? What is 'th`d'?

Good catch, it should be 'the indices for each pack.'

> 
> And at the end I want to ask question:
> 
> What if older packs will be absent in the repo directory?
> How it affects bzr works? Can we start history horizon
> with current packs format or there should be additional
> work? Can packs support 'multiple repo' strategy?
> I.e. I talk about situation when some developer has
> very old history on CD and fresh history on their HDD.
> IIRC, this example was given by Linus in last year(?) flame
> between bzr and git.

We've started on bits of history horizon already with knits, Packs don't
make it easier or harder, though they do make it more efficient to do
some sorts of queries like 'what data is in revision X' that history
horizons and stacked or combined repositories need.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071025/2e3fd858/attachment-0001.pgp 


More information about the bazaar mailing list