help wanted - inventory splitting work

Robert Collins robertc at robertcollins.net
Tue Sep 30 08:01:31 BST 2008


So, I think the paper design stuff has raised good questions but there
is little more to do without getting our hands dirty.

I'm working on this in my repository branch -
http://people.ubuntu.com/~robertc/baz2.0/repository.

I'd love people to dive into this branch - just hacking on the same sort
of stuff, or picking one of the components that are needed and making it
work.

Until we have reasonably good answers for open questions, I don't intend
to put this forward for trunk. (OTOH, as good answers are ready I'll
pull those out and send for merge/review immediately).

There isn't really a ROADMAP for this work other than 'finish answering
the questions, make it faster than what we have, profit'.

The decided (based on consensus from current discussion, subject to
change if other devs object, raise issues, issues are found during dev
and testing) highpoints are:
 - move inventory splitting up from being below-byte-storage (current:
knit-delta -many deltas are read to get one inventory object) to being
how-we-use-byte-storage (so that we can store an explicit datastructure
which is amenable to partial reading)
 - use CHK (content hash keys) to address the individual split-out-items

The undecided aspects, which experimentation/modeling is needed on:
 - should we have a filename->fileid map (leaning to yes, need to test)
 - should we have a fileid->filename map (leaning to yes, need to test)
 - should the inventory entry data be stored in the name->id map, or the
id->name map, or both (leaning to id->name map only)
 - should we canonicalise nodes by size rules, or (only applies to a
name map that doesn't hash keys) by directory membership
 - should maps hash keys or not, or per-map ? (leaning to no-for-names,
yes-for-fileids)
 - should the root node of the tree be indexed purely by chk, or by
revision-id (I am leaning to revision-id)

The routine things:
 - serialiser/parser (I plan on utf8, NULL-delimited, it seems to work
well).

I'd just love to be  getting patches/test results/profiling/analyse from
others on any or all of these things.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080930/4db8db88/attachment-0001.pgp 


More information about the bazaar mailing list