[MERGE, 165306] Use the per-file graph during commits of pack's

Tue Nov 27 22:30:43 GMT 2007

On Tue, 2007-11-27 at 17:15 -0500, John Arbash Meinel wrote:
> John Arbash Meinel has voted tweak.
> Status is now: Conditionally approved
> Comment:
> === modified file 'bzrlib/index.py'
> ...
> 
> +    def get_parents(self, revision_ids):
> +        """See StackedParentsProvider.get_parents.
> +
> +        This implementation thunks the graph.Graph.get_parents api 
> across to
> +        GraphIndex.
> +
> +        :param revision_ids: An iterable of graph keys for this graph.
> 
> ^- This shouldn't be called revision_ids if it is actually graph keys.
> 
> I found it hard to sort out how it was working when you were later on
> using (file_id, revision_id) tuples.

Theres an API problem though; the 'interface' uses revision_ids. I don't
know of any callers doing get_parents(revision_ids=xxx), but for
compatibility I wanted to have that work.

> Maybe call it:
> 
> def get_parents(self, node_keys):
> ...
>      :param node_keys: An iterable of graph keys for this graph.
>          (eg. for a simple revision graph, this will be revision ids, for 
> a
>          CombinedGraph this will be (file_id, revision_id) tuples.)
> 
> 
> This also means that your later code:
>          search_keys = set(revision_ids)
>          search_keys.discard(NULL_REVISION)
>          found_parents = {NULL_REVISION:[]}
> 
> Is probably bogus. Because (file_id, NULL_REVISION) will not be found.

Actually, NULL_REVISION is used as a constant, - (NULL_REVISION,) in a
parents list, which leads to NULL_REVISION as a key, so no - it's not
bogus :). And we can't do ((NULL_REVISION, NULL_REVISION),) because the
graph code wants NULL_REVISION as the constant for 'end of graph.

> I also wonder about the logic of returning (NULL_REVISION,) as the 
> parents
> of a (file_id, revision_id) pair.

It's part of the contract of graph.

> I also notice that "Combined_Graph.iter_keys says:"
>          :return: An iterable of (index, key, reference_lists, value). 
> There is no
>              defined order for the result iteration - it will be in the 
> most
>              efficient order for the index.
> but you are looping on:
>          for index, key, value, refs in self.iter_entries(search_keys):
> 
> I'm guessing the comment for Combined_Graph.iter_entries is wrong.

I'll correct that. references_lists is indeed after value.

> Is there a way to reduce the duplication in __init__ for 
> PackCommitBuilder
> and PackRootCommitBuilder? They both have the same __init__ and the same
> implementation of heads().

Yah, I didn't find a good one when I wrote it; and as you haven't
proposed one I think we can leave it for now.

> I don't see any direct tests for CommitBuilder.heads(). Which seems like
> they should be in
> bzrlib/tests/repository_implementations/test_commit_builder.py
> I only see heads tests for test_graph.py and test_inv.py

So, heads() is very heavily, indirectly, tested by the tests for
last-modified generation. I'm in two minds here about what to do.

One way would be to migrate those tests to tests for heads(), and a test
that heads() is called and used correctly. Another is to consider
heads() an implementation internal of the builder (and perhaps make it
_heads, though as subclasses need to implement it thats in our grey area
for how things are defined as public/private).

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071128/bc431ee7/attachment.pgp