[RFC] Removing the inventory concept from Bazaar.

Thu May 10 15:27:03 BST 2007

I'm fully behind that and eager to help.  Making inventories into a
primary object is partly my fault-- when I implemented revert, I needed
to know about dangling entries/missing files, and Tree didn't provide
that information.

> So would this be a bit like CVS, where we'd build a tree by looking at
> each of the knit/rcs files to see if that file is active in the
> current tree?

No, this would be more like git.  To build up the tree shape, you'd
start at the root directory, list the children, and recurse into the
children.

However, if you wanted to find out whether a given file was present, you
would be able to do that as an O(1) operation.

> I don't think I understand what you would store for directories?

A list of children, as metadata.

> Briefly, I would say that we should split the inventory up by
> directory.  Within the current storage, we could store each directory
> within that directory's knit file.  When a file is committed, we would
> need to cascade these changes up through all the containing
> directories.

My idea is similar, but scales a bit differently.

Your idea to break up inventories up by directory means that in order to
get the metadata about a file, you need to retrieve a record that
contains metadata about the file's siblings.

So your split-inventories scales O(number of siblings), where my
no-inventories scales O(1).  In theory, O(1) is better, but in practice,
number of siblings may be small enough that performance is acceptable.

However, I should note that I have a tree versioned in Bazaar which
contains 110716 files.  It contains a single directory with 46551
entries.  So for this tree, split-inventories would give only a 2.3x
speedup in the worst case.  But no-inventories would give a 110716x speedup.

I am not sure whether my use case is worth supporting.  It is clearly
not source code (in fact, it is confidential data), so in a sense I am
abusing Bazaar.  But it seems likely that others will abuse Bazaar the
same way.

> # Provide a paths2ids method with O(n) scaling-- i.e, when asked to
> determine ids for all paths in a tree, it will traverse each directory
> exactly once.
> 
> That would be O(total tree entries).  Still sounds worthwhile.

I'm not sure I communicated this clearly.

I was proposing a method called Tree.paths2ids which would take a list
of paths, and return a list of ids (or possibly a map of path to id).
It would scale O(number of unique path elements).  In the worst case,
you ask it to translate all the paths in the tree, which is O(total tree
entries).

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGQyu30F+nu1YWqI0RAosgAJ4nqvw/85wnbiOiyifRfLyj0TZxVwCghFZ5
Z7Ikq+6UcU/oU5g3EcyJxAA=
=MPt4
-----END PGP SIGNATURE-----