RFC: split-inventories design: children-of-node/path-traversal
Robert Collins
robertc at robertcollins.net
Wed Oct 29 02:47:35 GMT 2008
Opening this up for discussion; I don't have a solid answer yet, so this
is a bit of a brain-dump at this pont.
Doing path traversal - a/b/c could be done a few ways.
basename index w/value list:
here we would have a index on the basename, which isn't unique, so we'd
need to be able to treat the value as a list, or allow repeated keys, or
some such. Path traversal would then be a matter of doing set reduction
- to get b, we could just lookup b, grab all the fileids with a basename
of b, and walk up from them to the root - one will resolve to a/b/c.
fullpath index:
here we have an index on the full path. This is unique and allows direct
lookup at the cost of longer keys with lots of duplicate data. It gets
high locality of reference for similar paths.
fullpath index w/hashed keys:
here we index the full path but hash the key at the index layer, so the
keys are fixed length, but we lose locality of reference for similar
paths.
parentid->childid index w/value list:
here we would have an index on fileid that lists the child file ids.
Traversal would be done by starting with the root fileid, looking up all
its child ids, until one matches 'a', and repeating
parentid,child_basename->childid index:
here we would have an index on (fileid, child_basename). This requires a
index which supports things like 'foo bar' as keys, or encoding of
basenames before using it.
Now, directory objects in memory have a children attribute listing all
the paths within the directory; call this path discovery or 'ls'.
So doing ls: ls a/*:
basename index :- doesn't help
fullpath index :- if we can iterate on keys starting with 'a/' we can
find a/b and a/b/c, but this would need some logic to avoid finding out
about c when we are only interested in children of a.
fullpath hash-keys :- doesn't help
parentid-> childid :- works fine, just read everything.
parentid,child_basename :- works well, if we can get everything with the
parent's value.
I think the parentid,child_basename -> childid approach has legs, and I
intend to make sure the CHKMap will support it.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20081029/34159e75/attachment.pgp
More information about the bazaar
mailing list