[RFC] Removing the inventory concept from Bazaar.
Martin Pool
mbp at sourcefrog.net
Thu May 10 08:56:04 BST 2007
On 5/8/07, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Inspired by Robert's post, I started thinking about improving our
> scaling in large trees. I decided to take the extreme approach of
> removing inventories as a concept, to see what broke.
>
> It seems to work, so I've started a draft spec here:
> http://bazaar-vcs.org/DraftSpecs/NoInventory
That seems interesting.
I thought you were talking about hiding the concept of the in-memory
Inventory, which I would like to do - I think it can be an
implementation detail of Tree much more than at present, which would
make this kind of change easier and also remove some redundancy and
match better with dirstate.
So would this be a bit like CVS, where we'd build a tree by looking at
each of the knit/rcs files to see if that file is active in the
current tree?
I don't think I understand what you would store for directories?
Nothing? Or a knit with only metadata and no text? Or a list of
children?
Briefly, I would say that we should split the inventory up by
directory. Within the current storage, we could store each directory
within that directory's knit file. When a file is committed, we would
need to cascade these changes up through all the containing
directories.
This gets closer to scaling relative to the size of the affected data,
and avoids diffing a single enormous inventory file. It would be fast
to get the inventory entry for a file given its containing directory
and the file name or id. Finding a file given only its id with no
idea which directory it's in would still require scanning all the
directories, which would be somewhat slower if they're spread out but
we only need to do this if the file has moved to a different
directory, which perhaps is a rare case.
# Provide a paths2ids method with O(n) scaling-- i.e, when asked to
determine ids for all paths in a tree, it will traverse each directory
exactly once.
That would be O(total tree entries). Still sounds worthwhile.
--
Martin
More information about the bazaar
mailing list