[MERGE] Fix non-rich to rich-root fetch
Ian Clatworthy
ian.clatworthy at internode.on.net
Tue Apr 29 08:15:56 BST 2008
Aaron Bentley wrote:
> Hi all,
>
> This patch solves all the bugs I know about when fetching from any
> non-rich-root repository into any rich-root repository. I have used an
> earlier version to successfully convert the Bazaar source tree, which
> has ghosts and unique root ids, into rich-root-pack format.
To the extent of my knowledge in this area, this looks ok technically. I
did find it much harder to understand exactly what was going on though
than I anticipated, so I'm requesting some refactoring before this gets
merged. I appreciate that this is the first of a few patches in this
area, so let me know if you think the refactoring requests below are
premature. My thinking is this: if we encounter any bugs upgrading most
bzr repositories in the world to rick-root-pack, then the code needs to
be as clear as it can be, recognising that there is a base level of
complexity here given the numerous corner cases that can arise.
bb:resubmit
> inventory_weave = self.source.get_inventory_weave()
> - parent_texts = {}
> versionedfile = {}
inventory_weave isn't used anywhere in this routine so can go?
versionedfile (as a map) isn't either now.
> to_store = self.target.weave_store
> - parent_map = self.source.get_graph().get_parent_map(revs)
> + graph = self.source.get_graph()
> + parent_map = graph.get_parent_map(revs)
OK. to_store is set well before it's being used though.
> + planned_versions = {}
> + revision_root = {}
> for tree in self.iter_rev_trees(revs):
> revision_id = tree.inventory.root.revision
> root_id = tree.get_root_id()
> - parents = parent_map[revision_id]
> - if parents[0] == NULL_REVISION:
> - parents = ()
> - if root_id not in versionedfile:
> - versionedfile[root_id] = to_store.get_weave_or_empty(root_id,
> - self.target.get_transaction())
> - _, _, parent_texts[root_id] = versionedfile[root_id].add_lines(
> - revision_id, parents, [], parent_texts)
> + planned_versions.setdefault(root_id, []).append(revision_id)
> + revision_root[revision_id] = root_id
> + # Find out which parents we don't already know root ids for
> + parents = set()
> + for revision_parents in parent_map.itervalues():
> + parents.update(revision_parents)
> + parents.difference_update(revision_root.keys() + [NULL_REVISION])
> + # Limit to revision present in the versionedfile
> + parents = graph.get_parent_map(parents).keys()
> + for tree in self.iter_rev_trees(parents):
> + root_id = tree.get_root_id()
> + revision_root[tree.get_revision_id()] = root_id
I think this block of code needs to be a method in its own right,
returning planned_revisions and revision_root. Given your overall
approach of "make it work, make it right, make it fast" and recognising
this is step 1, I didn't spent much time thinking though the performance
implications of the new code. The multiple calls to iter_rev_trees()
stands out though. In any case, we can optimise it later and making the
code a separate method ought to help both in the profiling itself and in
reviewing the faster code if it is indeed necessary to tune it.
> + for root_id, versions in planned_versions.iteritems():
> + versionedfile = to_store.get_weave_or_empty(root_id,
> + self.target.get_transaction())
> + parent_texts = {}
> + for revision_id in versions:
> + if revision_id in versionedfile:
> + continue
> + parents = parent_map[revision_id]
> + # When a parent revision is a ghost, we guess that its root id
> + # was unchanged.
> + parents = tuple(p for p in parents if p != NULL_REVISION
> + and revision_root.get(p, root_id) == root_id)
> + result = versionedfile.add_lines_with_ghosts(
> + revision_id, parents, [], parent_texts)
> + parent_texts[revision_id] = result[2]
>
This looks ok. I'd appreciate it though if the comment also explained
that parents get dropped if their revision_root doesn't match, and why
you're doing that.
> +class Test1To2Fetch(TestCaseWithTransport):
> + """Tests for Model1To2 failure modes"""
> +
> + def do_test(self, first, second):
> + """Test that fetch works no matter what the set order of revision is.
> +
> + This test depends on the order of items in a set, which is
> + implementation-dependant, so we test A, B and then B, A.
> + """
> + self.make_tree_and_repo()
> + self.tree.commit('Commit 1', rev_id=first)
> + self.tree.commit('Commit 2', rev_id=second)
> + self.repo.fetch(self.tree.branch.repository, second)
> +
> + def get_parents(self, file_id, revision_id):
> + transaction = self.repo.get_transaction()
> + vf = self.repo.weave_store.get_weave(file_id, transaction)
> + return vf.get_parents_with_ghosts(revision_id)
> +
> + def test_fetch_order_AB(self):
> + """See do_test"""
> + self.do_test('A', 'B')
> +
> + def test_fetch_order_BA(self):
> + """See do_test"""
> + self.do_test('B', 'A')
> +
> + def test_fetch_ghosts(self):
> + self.make_tree_and_repo()
> + self.tree.commit('first commit', rev_id='left-parent')
> + self.tree.add_parent_tree_id('ghost-parent')
> + fork = self.tree.bzrdir.sprout('fork', 'null:').open_workingtree()
> + fork.commit('not a ghost', rev_id='not-ghost-parent')
> + self.tree.branch.repository.fetch(fork.branch.repository,
> + 'not-ghost-parent')
> + self.tree.add_parent_tree_id('not-ghost-parent')
> + self.tree.commit('second commit', rev_id='second-id')
> + self.repo.fetch(self.tree.branch.repository, 'second-id')
> + root_id = self.tree.get_root_id()
> + self.assertEqual(['left-parent', 'ghost-parent', 'not-ghost-parent'],
> + self.get_parents(root_id, 'second-id'))
> +
> + def make_tree_and_repo(self):
> + self.tree = self.make_branch_and_tree('tree', format='pack-0.92')
> + self.repo = self.make_repository('rich-repo', format='rich-root-pack')
> + self.repo.lock_write()
> + self.addCleanup(self.repo.unlock)
> +
My initial reaction on seeing this class was that it ought to be two:
one covering the fetch_order stuff and another covering the ghost and
changed root issues. The make_tree_and_repo() helper method does tie
them together though, and abstracting that into an abstract class seems
more effort/complexity than it's worth. To "partition" things slightly
better though, I'd like to see:
* make_tree_and_repo() made the first method in the class
* do_test and the two routines using it put right next to each other
(without get_parents in the middle)
do_test() is probably just a little *too* generic a name as well given
the numerous other tests covered by the class. do_fetch_order_test() or
do_order_test() would be better I feel.
Ian C.
More information about the bazaar
mailing list