[MERGE] faster topo_sort

Fri Feb 24 16:48:10 GMT 2006

On 25 Feb 2006, Robert Collins <robertc at robertcollins.net> wrote:
> On Sat, 2006-02-25 at 00:13 +1100, Robert Collins wrote:
> > So - really - I'd like to replace the current topo_sort routine with
> > this one. 
> 
> In further news here is another edition of this patch, with the
> topological_sort in a function and with the tsort tests running against
> it... they pass.

Is there any reason not to just actually replace it in tsort.py?  The
current one is as you say a bit simpleminded.

> +"""Plugin to fix some potential data errors in a branch.

Not a plugin anymore.

I wonder if we should save a nice word like "reconcile" for a more
common or more user-oriented operation, and call this "fix-graph" or
something?

> +
> +This makes sure that the inventory weave's DAG of ancestry is correct so that
> +attempts to fetch the branch over http, or certain merge operations cope
> +correctly.
> +
> +This is most likely needed if you have used fetch-ghosts from bzrlib to
> +resolve ghosts after a baz (or otherwise) import and you get bizarre behaviour
> +when either exporting over http or when merging from other translated branches.
> +"""
> +
> +
> +import bzrlib.branch
> +import bzrlib.errors as errors
> +import bzrlib.progress
> +from bzrlib.trace import mutter
> +import bzrlib.ui as ui

> +from bzrlib.weavefile import write_weave_v5 as w5

Really a good idea?  It doesn't seem to be used anyhow.

> +class TopoSorter(object):
> +
> +    def sort(self, graph):

If this is intended to be used only internally by topological_sort(),
then it should have a private name.  Otherwise (or perhaps anyhow) it
needs docstrings explaining what is supposed to be passed in by graph
and how the result is returned.

Maybe it would be better expressed as nested functions within
topological_sort acting on local variables?  (Maybe this is just
personal taste but this thing does not seem to feel like a proper
object -- e.g. in essentially returning a value by setting private
fields.)  I think in Python if something takes some inputs and returns a
value it is clearer to just make it a function, all other things being
equal.

Is this code actually specific to operating on revisions or not?  If not
it's fine to still use revision terms to make the explanation concrete
but it should be stated.

> +        # the total set of revisions to process
> +        self._rev_graph = dict(graph)
> +        ### if debugging:
> +        # self._original_graph = dict(graph)
> +        
> +        # the current line of history being processed.
> +        # once we add in per inventory root ids this will involve a single
> +        # pass within the revision knit at that point.
> +        # these contain respectively the revision id in the call stack
> +        self._rev_stack = []
> +        # a set of the revisions on the stack, for cycle detection
> +        self._rev_set = set()
> +        # and the not_ghost_parents.
> +        self._pending_stack = []
> +        # actual queue to reinsert. This is a single pass queue we can use to 
> +        # regenerate the inventory versioned file, and holds
> +        # (rev_id, non_ghost_parents) entries
> +        self._work_queue = []
> +        # this is a set for fast lookups of queued revisions
> +        self._work_set = set()
> +        # total steps = 1 read per revision + one insert into the inventory
> +
> +        # now we do a depth first search of the revision graph until its empty.
> +        # this gives us a topological order across all revisions in 
> +        # self._all_revisions.
> +        # This is flattened to avoid deep recursion:
> +        # At each step in the call graph we need to know which parent we are on.
> +        # we do this by having three variables for each stack frame:
> +        # revision_id being descended into (self._rev_stack)
> +        # current queue of parents to descend into 
> +        #   (self._pending_stack)
> +        # putting all the parents into a pending list, left to 
> +        # right and processing the right most one each time. The same parent
> +        # may occur at multiple places in the stack - thats fine. We check its
> +        # not in the output before processing. However revisions cannot
> +        # appear twice.
> +        # the algorithm is roughly:
> +        # for revision, parents in _rev_graph.items():
> +        #   if revision in self._all_revisions:
> +        #     continue
> +        #   add_revision_parents(revision, parents)
> +        #  def add_revision_parents(revision, parents)
> +        #    fpr parent in parents:
> +        #       add_revision_parents(parent, parent.parents)
> +        #   self._all_revisions.append(revision)
> +        #   self._all_revision_parents.appent(parents)
> +        # however we tune this to visit fragmens of the graph
> +        # and not double-visit entries.
> +        # nevertheless the output is a 
> +        # self._work_queue of revisions, nonghost parents in 
> +        # topological order and
> +        # a self._work_set which is a complete set of revisions.
> +        while self._rev_graph:
> +            rev_id, parents = self._rev_graph.iteritems().next()

This seems to construct an iterator, pull one value, and then discard
it, which seems a bit ugly/wasteful.  Would it work to just iterate
through it in place of the while loop?

> +            # rev_id
> +            self._stack_revision(rev_id, parents)
> +            while self._rev_stack:
> +                # loop until this call completes.
> +                parents_to_visit = self._pending_stack[-1]
> +                # if all parents are done, the revision is done
> +                if not parents_to_visit:
> +                    # append the revision to the topo sorted list
> +                    self._unstack_revision()
> +                else:
> +                    while self._pending_stack[-1]:
> +                        # recurse depth first into a single parent 
> +                        next_rev_id = self._pending_stack[-1].pop()
> +                        if next_rev_id in self._work_set:
> +                            # this parent was completed by a child on the
> +                            # call stack. skip it.
> +                            continue
> +                        # push it into the call stack
> +                        self._stack_revision(next_rev_id, self._rev_graph[next_rev_id])
> +                        # and do not continue processing parents until this 'call' 
> +                        # has recursed.
> +                        break
> +            
> +                # we are now recursing down a call stack.
> +                # its a width first search which 
> +
> +###        Useful if fiddling with this code.
> +###        # cross check
> +###        for index in range(len(self._work_queue)):
> +###            rev = self._work_queue[index]
> +###            for left_index in range(index):
> +###                if rev in self.original_graph[self._work_queue[left_index]]:
> +###                    print "revision in parent list of earlier revision"
> +###                    import pdb;pdb.set_trace()
> +
> +    def _stack_revision(self, rev_id, parents):
> +        """Add rev_id to the pending revision stack."""
> +        # detect cycles:
> +        if rev_id in self._rev_set:
> +            # we only supply the revisions that led to the cycle. This isn't
> +            # minimal though... but it is usually a subset of the entire graph
> +            # and easier to debug.
> +            raise errors.GraphCycleError(self._rev_set)
> +
> +        self._rev_stack.append(rev_id)
> +        self._rev_set.add(rev_id)
> +        self._pending_stack.append(list(parents))
> +
> +    def _unstack_revision(self):
> +        """A revision has been completed.
> +
> +        The revision is added to the work queue, and the data for
> +        it popped from the call stack.
> +        """
> +        rev_id = self._rev_stack.pop()
> +        self._rev_set.remove(rev_id)
> +        self._pending_stack.pop()
> +        self._work_queue.append(rev_id)
> +        self._work_set.add(rev_id)
> +        # and remove it from the rev graph as its now complete
> +        self._rev_graph.pop(rev_id)
> +
> +
> +class Reconciler(object):
> +    """Reconcilers are used to reconcile existing data.
> +
> +    Currently this is limited to a single repository, and consists
> +    of an inventory reweave with revision cross-checks.
> +    """
> +
> +    def __init__(self, dir):
> +        self.bzrdir = dir
> +
> +    def reconcile(self):
> +        """Actually perform the reconciliation."""
> +        self.pb = ui.ui_factory.progress_bar()
> +        self.repo = self.bzrdir.open_repository()
> +        self.repo.lock_write()
> +        try:
> +            self.pb.note('Reconciling repository %s',
> +                         self.repo.bzrdir.root_transport.base)
> +            self._reweave_inventory()
> +        finally:
> +            self.repo.unlock()
> +        self.pb.note('Reconciliation complete.')
> +
> +    def _reweave_inventory(self):
> +        """Regenerate the inventory weave for the repository from scratch."""
> +        self.pb.update('Reading inventory data.')
> +        self.inventory = self.repo.get_inventory_weave()
> +        self.repo.control_weaves.put_weave('inventory.backup',
> +                                           self.inventory,
> +                                           self.repo.get_transaction())
> +        self.pb.note('Backup Inventory created.')
> +        # asking for '' should never return a non-empty weave
> +        new_inventory = self.repo.control_weaves.get_weave_or_empty('',
> +            self.repo.get_transaction())
> +
> +        # the total set of revisions to process
> +        self.pending = set([file_id for file_id in self.repo.revision_store])
> +
> +        # total steps = 1 read per revision + one insert into the inventory
> +        self.total = len(self.pending) * 2
> +        self.count = 0
> +
> +        # mapping from revision_id to parents
> +        self._rev_graph = {}
> +        # we need the revision id of each revision and its available parents list
> +        for rev_id in self.pending:
> +            # put a revision into the graph.
> +            self._graph_revision(rev_id)
> +
> +        ordered_rev_ids = topological_sort(self._rev_graph.items())
> +        self._work_queue = [(rev_id, self._rev_graph[rev_id]) for 
> +                            rev_id in ordered_rev_ids]
> +
> +        # we have topological order of revisions and non ghost parents ready.
> +        while self._work_queue:
> +            rev_id, parents = self._work_queue.pop(0)
> +            # double check this really is in topological order.
> +            unavailable = [p for p in parents if p not in new_inventory]
> +            assert len(unavailable) == 0
> +            # this entry has all the non ghost parents in the inventory
> +            # file already.
> +            self._reweave_step('adding inventories')
> +            new_inventory.add(rev_id, parents, self.inventory.get(rev_id))
> +
> +        # if this worked, the set of new_inventory.names should equal
> +        # self.pending
> +        assert set(new_inventory.names()) == self.pending
> +        self.pb.update('Writing weave')
> +        self.repo.control_weaves.put_weave('inventory',
> +                                           new_inventory,
> +                                           self.repo.get_transaction())
> +        self.inventory = None
> +        self.pb.note('Inventory regenerated.')
> +
> +    def _graph_revision(self, rev_id):
> +        """Load a revision into the revision graph."""
> +        # pick a random revision
> +        # analyse revision id rev_id and put it in the stack.
> +        self._reweave_step('loading revisions')
> +        rev = self.repo.get_revision(rev_id)
> +        assert rev.revision_id == rev_id
> +        parents = []
> +        for parent in rev.parent_ids:
> +            if parent in self.inventory:
> +                parents.append(parent)
> +            else:
> +                mutter('found ghost %s', parent)
> +        self._rev_graph[rev_id] = parents   
> +
> +    def _reweave_step(self, message):
> +        """Mark a single step of regeneration complete."""
> +        self.pb.update(message, self.count, self.total)
> +        self.count += 1
> 
> === added file 'bzrlib/tests/blackbox/test_reconcile.py'
> --- /dev/null	
> +++ bzrlib/tests/blackbox/test_reconcile.py	
> @@ -0,0 +1,51 @@
> +# Copyright (C) 2006 by Canonical Ltd
> +# Authors: Robert Collins <robert.collins at canonical.com>

We don't normally put these in but rather rely on bzr annotations.  Do
you strongly want it?

> +# -*- coding: utf-8 -*-
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> +
> +"""Black box tests for the reconcile command."""
> +
> +
> +import bzrlib.bzrdir as bzrdir
> +import bzrlib.repository as repository
> +from bzrlib.tests import TestCaseWithTransport
> +from bzrlib.tests.blackbox import TestUIFactory
> +from bzrlib.transport import get_transport
> +import bzrlib.ui as ui
> +
> +
> +class TrivialTest(TestCaseWithTransport):
> +
> +    def setUp(self):
> +        super(TrivialTest, self).setUp()
> +        self.old_format = bzrdir.BzrDirFormat.get_default_format()
> +        self.old_ui_factory = ui.ui_factory
> +        self.addCleanup(self.restoreDefaults)
> +        ui.ui_factory = TestUIFactory()
> +
> +    def restoreDefaults(self):
> +        ui.ui_factory = self.old_ui_factory
> +
> +    def test_trivial_reconcile(self):
> +        t = bzrdir.BzrDir.create_standalone_workingtree('.')
> +        (out, err) = self.run_bzr_captured(['reconcile'])
> +        self.assertEqualDiff(out, "Reconciling repository %s\n"
> +                                  "Backup Inventory created.\n"
> +                                  "Inventory regenerated.\n"
> +                                  "Reconciliation complete.\n" %
> +                                  t.bzrdir.root_transport.base)
> +        self.assertEqualDiff(err, "")
> +
> 
> === added file 'bzrlib/tests/repository_implementations/test_reconcile.py'
> --- /dev/null	
> +++ bzrlib/tests/repository_implementations/test_reconcile.py	
> @@ -0,0 +1,106 @@
> +# Copyright (C) 2006 by Canonical Ltd
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> +
> +"""Tests for reconiliation of repositories."""
> +
> +
> +import bzrlib
> +import bzrlib.errors as errors
> +from bzrlib.reconcile import reconcile
> +from bzrlib.revision import Revision
> +from bzrlib.tests.repository_implementations.test_repository import TestCaseWithRepository
> +from bzrlib.transport import get_transport
> +from bzrlib.tree import EmptyTree
> +from bzrlib.workingtree import WorkingTree
> +
> +
> +class TestNeedingReweave(TestCaseWithRepository):
> +
> +    def setUp(self):
> +        super(TestNeedingReweave, self).setUp()
> +        
> +        t = get_transport(self.get_url())
> +        # an empty inventory with no revision for testing with.
> +        repo = self.make_repository('inventory_no_revision')
> +        inv = EmptyTree().inventory
> +        repo.add_inventory('missing', inv, [])
> +
> +        # a inventory with no parents and the revision has parents..
> +        # i.e. a ghost.
> +        t.copy_tree('inventory_no_revision', 'inventory_one_ghost')
> +        repo = bzrlib.repository.Repository.open('inventory_one_ghost')
> +        sha1 = repo.add_inventory('ghost', inv, [])
> +        rev = Revision(timestamp=0,
> +                       timezone=None,
> +                       committer="Foo Bar <foo at example.com>",
> +                       message="Message",
> +                       inventory_sha1=sha1,
> +                       revision_id='ghost')
> +        rev.parent_ids = ['the_ghost']
> +        repo.add_revision('ghost', rev)
> +         
> +        # a inventory with a ghost that can be corrected now.
> +        t.copy_tree('inventory_one_ghost', 'inventory_ghost_present')
> +        repo = bzrlib.repository.Repository.open('inventory_ghost_present')
> +        sha1 = repo.add_inventory('the_ghost', inv, [])
> +        rev = Revision(timestamp=0,
> +                       timezone=None,
> +                       committer="Foo Bar <foo at example.com>",
> +                       message="Message",
> +                       inventory_sha1=sha1,
> +                       revision_id='the_ghost')
> +        rev.parent_ids = []
> +        repo.add_revision('the_ghost', rev)
> +         
> +
> +    def test_reweave_empty_makes_backup_wave(self):
> +        self.make_repository('empty')
> +        d = bzrlib.bzrdir.BzrDir.open('empty')
> +        reconcile(d)
> +        repo = d.open_repository()
> +        repo.control_weaves.get_weave('inventory.backup',
> +                                      repo.get_transaction())
> +
> +    def test_reweave_inventory_without_revision(self):
> +        d = bzrlib.bzrdir.BzrDir.open('inventory_no_revision')
> +        reconcile(d)
> +        # now the backup should have it but not the current inventory
> +        repo = d.open_repository()
> +        backup = repo.control_weaves.get_weave('inventory.backup',
> +                                               repo.get_transaction())
> +        self.assertTrue('missing' in backup.names())
> +        self.assertRaises(errors.WeaveRevisionNotPresent,
> +                          repo.get_inventory, 'missing')
> +
> +    def test_reweave_inventory_preserves_a_revision_with_ghosts(self):
> +        d = bzrlib.bzrdir.BzrDir.open('inventory_one_ghost')
> +        reconcile(d)
> +        # now the current inventory should still have 'ghost'
> +        repo = d.open_repository()
> +        repo.get_inventory('ghost')
> +        self.assertEqual([None, 'ghost'], repo.get_ancestry('ghost'))
> +        
> +    def test_reweave_inventory_fixes_ancestryfor_a_present_ghost(self):
> +        d = bzrlib.bzrdir.BzrDir.open('inventory_ghost_present')
> +        repo = d.open_repository()
> +        self.assertEqual([None, 'ghost'], repo.get_ancestry('ghost'))
> +        reconcile(d)
> +        # now the current inventory should still have 'ghost'
> +        repo = d.open_repository()
> +        repo.get_inventory('ghost')
> +        repo.get_inventory('the_ghost')
> +        self.assertEqual([None, 'the_ghost', 'ghost'], repo.get_ancestry('ghost'))
> +        self.assertEqual([None, 'the_ghost'], repo.get_ancestry('the_ghost'))
> 
> === modified file 'bzrlib/builtins.py'
> --- bzrlib/builtins.py	
> +++ bzrlib/builtins.py	
> @@ -767,6 +767,32 @@
>              raise BzrError("%r is not a versioned file" % filename)
>          for fip in inv.get_idpath(fid):
>              print fip
> +
> +
> +class cmd_reconcile(Command):
> +    """Reconcile bzr metadata in a branch.
> +
> +    This can correct data mismatches that may have been caused by
> +    previous ghost operations or bzr upgrades. You should only
> +    need to run this command if 'bzr check' or a bzr developer 
> +    advises you to run it.
> +
> +    If a second branch is provided, cross-branch reconciliation is
> +    also attempted, which will check that data like the tree root
> +    id which was not present in very early bzr versions is represented
> +    correctly in both branches.
> +
> +    At the same time it is run it may recompress data resulting in 
> +    a potential saving in disk space or performance gain.
> +
> +    The branch *MUST* be on a listable system such as local disk or sftp.
> +    """
> +    takes_args = ['branch?']
> +
> +    def run(self, branch="."):
> +        from bzrlib.reconcile import reconcile
> +        dir = bzrlib.bzrdir.BzrDir.open(branch)
> +        reconcile(dir)
>  
>  
>  class cmd_revision_history(Command):
> 
> === modified file 'bzrlib/commit.py'
> --- bzrlib/commit.py	
> +++ bzrlib/commit.py	
> @@ -76,7 +76,7 @@
>  from bzrlib.osutils import (local_time_offset,
>                              rand_bytes, compact_date,
>                              kind_marker, is_inside_any, quotefn,
> -                            sha_string, sha_strings, sha_file, isdir, isfile,
> +                            sha_file, isdir, isfile,
>                              split_lines)
>  import bzrlib.config
>  import bzrlib.errors as errors
> @@ -85,7 +85,6 @@
>                             ConflictsInTree,
>                             StrictCommitFailed
>                             )
> -import bzrlib.gpg as gpg
>  from bzrlib.revision import Revision
>  from bzrlib.testament import Testament
>  from bzrlib.trace import mutter, note, warning
> @@ -295,7 +294,11 @@
>              if len(list(self.work_tree.iter_conflicts()))>0:
>                  raise ConflictsInTree
>  
> -            self._record_inventory()
> +            self.inv_sha1 = self.branch.repository.add_inventory(
> +                self.rev_id,
> +                self.new_inv,
> +                self.present_parents
> +                )
>              self._make_revision()
>              self.work_tree.set_pending_merges([])
>              self.branch.append_revision(self.rev_id)
> @@ -316,15 +319,6 @@
>          finally:
>              self.branch.unlock()
>  
> -    def _record_inventory(self):
> -        """Store the inventory for the new revision."""
> -        inv_text = serializer_v5.write_inventory_to_string(self.new_inv)
> -        self.inv_sha1 = sha_string(inv_text)
> -        s = self.branch.repository.control_weaves
> -        s.add_text('inventory', self.rev_id,
> -                   split_lines(inv_text), self.present_parents,
> -                   self.branch.get_transaction())
> -
>      def _escape_commit_message(self):
>          """Replace xml-incompatible control characters."""
>          # Python strings can include characters that can't be
> @@ -366,23 +360,15 @@
>              
>      def _make_revision(self):
>          """Record a new revision object for this commit."""
> -        self.rev = Revision(timestamp=self.timestamp,
> -                            timezone=self.timezone,
> -                            committer=self.committer,
> -                            message=self.message,
> -                            inventory_sha1=self.inv_sha1,
> -                            revision_id=self.rev_id,
> -                            properties=self.revprops)
> -        self.rev.parent_ids = self.parents
> -        rev_tmp = StringIO()
> -        serializer_v5.write_revision(self.rev, rev_tmp)
> -        rev_tmp.seek(0)
> -        if self.config.signature_needed():
> -            plaintext = Testament(self.rev, self.new_inv).as_short_text()
> -            self.branch.repository.store_revision_signature(
> -                gpg.GPGStrategy(self.config), plaintext, self.rev_id)
> -        self.branch.repository.revision_store.add(rev_tmp, self.rev_id)
> -        mutter('new revision_id is {%s}', self.rev_id)
> +        rev = Revision(timestamp=self.timestamp,
> +                       timezone=self.timezone,
> +                       committer=self.committer,
> +                       message=self.message,
> +                       inventory_sha1=self.inv_sha1,
> +                       revision_id=self.rev_id,
> +                       properties=self.revprops)
> +        rev.parent_ids = self.parents
> +        self.branch.repository.add_revision(self.rev_id, rev, self.new_inv, self.config)
>  
>      def _remove_deleted(self):
>          """Remove deleted files from the working inventories.
> 
> === modified file 'bzrlib/repository.py'
> --- bzrlib/repository.py	
> +++ bzrlib/repository.py	
> @@ -23,6 +23,7 @@
>  from bzrlib.decorators import needs_read_lock, needs_write_lock
>  import bzrlib.errors as errors
>  from bzrlib.errors import InvalidRevisionId
> +import bzrlib.gpg as gpg
>  from bzrlib.lockable_files import LockableFiles
>  from bzrlib.osutils import safe_unicode
>  from bzrlib.revision import NULL_REVISION
> @@ -49,6 +50,54 @@
>      describe the disk data format and the way of accessing the (possibly 
>      remote) disk.
>      """
> +
> +    @needs_write_lock
> +    def add_inventory(self, revid, inv, parents):
> +        """Add the inventory inv to the repository as revid.
> +        
> +        :param parents: The revision ids of the parents that revid
> +                        is known to have and are in the repository already.
> +
> +        returns the sha1 of the serialized inventory.
> +        """
> +        inv_text = bzrlib.xml5.serializer_v5.write_inventory_to_string(inv)
> +        inv_sha1 = bzrlib.osutils.sha_string(inv_text)
> +        self.control_weaves.add_text('inventory', revid,
> +                   bzrlib.osutils.split_lines(inv_text), parents,
> +                   self.get_transaction())
> +        return inv_sha1
> +
> +    @needs_write_lock
> +    def add_revision(self, rev_id, rev, inv=None, config=None):
> +        """Add rev to the revision store as rev_id.
> +
> +        :param rev_id: the revision id to use.
> +        :param rev: The revision object.
> +        :param inv: The inventory for the revision. if None, it will be looked
> +                    up in the inventory storer
> +        :param config: If None no digital signature will be created.
> +                       If supplied its signature_needed method will be used
> +                       to determine if a signature should be made.
> +        """
> +        if config is not None and config.signature_needed():
> +            if inv is None:
> +                inv = self.get_inventory(rev_id)
> +            plaintext = Testament(rev, inv).as_short_text()
> +            self.store_revision_signature(
> +                gpg.GPGStrategy(config), plaintext, rev_id)
> +        if not rev_id in self.get_inventory_weave():
> +            if inv is None:
> +                raise errors.WeaveRevisionNotPresent(rev_id,
> +                                                     self.get_inventory_weave())
> +            else:
> +                # yes, this is not suitable for adding with ghosts.
> +                self.add_inventory(rev_id, inv, rev.parent_ids)
> +            
> +        rev_tmp = StringIO()
> +        bzrlib.xml5.serializer_v5.write_revision(rev, rev_tmp)
> +        rev_tmp.seek(0)
> +        self.revision_store.add(rev_tmp, rev_id)
> +        mutter('added revision_id {%s}', rev_id)
>  
>      @needs_read_lock
>      def _all_possible_ids(self):
> 
> === modified file 'bzrlib/store/weave.py'
> --- bzrlib/store/weave.py	
> +++ bzrlib/store/weave.py	
> @@ -1,5 +1,3 @@
> -#! /usr/bin/python
> -
>  # Copyright (C) 2005 Canonical Ltd
>  
>  # This program is free software; you can redistribute it and/or modify
> 
> === modified file 'bzrlib/tests/blackbox/__init__.py'
> --- bzrlib/tests/blackbox/__init__.py	
> +++ bzrlib/tests/blackbox/__init__.py	
> @@ -29,6 +29,8 @@
>                            TestSuite,
>                            TestLoader,
>                            )
> +import bzrlib.ui as ui
> +
>  
>  def test_suite():
>      testmod_names = [
> @@ -47,6 +49,7 @@
>                       'bzrlib.tests.blackbox.test_missing',
>                       'bzrlib.tests.blackbox.test_outside_wt',
>                       'bzrlib.tests.blackbox.test_pull',
> +                     'bzrlib.tests.blackbox.test_reconcile',
>                       'bzrlib.tests.blackbox.test_re_sign',
>                       'bzrlib.tests.blackbox.test_revert',
>                       'bzrlib.tests.blackbox.test_revno',
> @@ -77,3 +80,21 @@
>              return self.run_bzr_captured(args, retcode=retcode)[0]
>          else:
>              return self.run_bzr_captured(args, retcode=retcode)
> +
> +
> +class TestUIFactory(ui.UIFactory):
> +    """A UI Factory which never captures its output.
> +    """
> +
> +    def clear(self):
> +        """See progress.ProgressBar.clear()."""
> +
> +    def note(self, fmt_string, *args, **kwargs):
> +        """See progress.ProgressBar.note()."""
> +        print fmt_string % args
> +
> +    def progress_bar(self):
> +        return self
> +        
> +    def update(self, message, count=None, total=None):
> +        """See progress.ProgressBar.update()."""
> 
> === modified file 'bzrlib/tests/blackbox/test_upgrade.py'
> --- bzrlib/tests/blackbox/test_upgrade.py	
> +++ bzrlib/tests/blackbox/test_upgrade.py	
> @@ -23,26 +23,9 @@
>  import bzrlib.bzrdir as bzrdir
>  import bzrlib.repository as repository
>  from bzrlib.tests import TestCaseWithTransport
> +from bzrlib.tests.blackbox import TestUIFactory
>  from bzrlib.transport import get_transport
>  import bzrlib.ui as ui
> -
> -
> -class TestUIFactory(ui.UIFactory):
> -    """A UI Factory which never captures its output.
> -    """
> -
> -    def clear(self):
> -        """See progress.ProgressBar.clear()."""
> -
> -    def note(self, fmt_string, *args, **kwargs):
> -        """See progress.ProgressBar.note()."""
> -        print fmt_string % args
> -
> -    def progress_bar(self):
> -        return self
> -        
> -    def update(self, message, count=None, total=None):
> -        """See progress.ProgressBar.update()."""
>  
>  
>  class TestWithUpgradableBranches(TestCaseWithTransport):
> 
> === modified file 'bzrlib/tests/repository_implementations/__init__.py'
> --- bzrlib/tests/repository_implementations/__init__.py	
> +++ bzrlib/tests/repository_implementations/__init__.py	
> @@ -41,6 +41,7 @@
>      result = TestSuite()
>      test_repository_implementations = [
>          'bzrlib.tests.repository_implementations.test_fileid_involved',
> +        'bzrlib.tests.repository_implementations.test_reconcile',
>          'bzrlib.tests.repository_implementations.test_repository',
>          ]
>      adapter = RepositoryTestProviderAdapter(
> 
> === modified file 'bzrlib/tests/repository_implementations/test_fileid_involved.py'
> --- bzrlib/tests/repository_implementations/test_fileid_involved.py	
> +++ bzrlib/tests/repository_implementations/test_fileid_involved.py	
> @@ -173,7 +173,6 @@
>          self.assertEquals( l1, l2 )
>  
>      def test_fileid_involved_full_compare(self):
> -        from bzrlib.tsort import topo_sort
>          pp=[]
>          history = self.branch.revision_history( )
>  
> 
> === modified file 'bzrlib/tests/test_tsort.py'
> --- bzrlib/tests/test_tsort.py	
> +++ bzrlib/tests/test_tsort.py	
> @@ -14,15 +14,14 @@
>  # along with this program; if not, write to the Free Software
>  # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>  
> -"""Tests for topological sort.
> -"""
> +"""Tests for topological sort."""
>  
> -import os
> -import sys
>  
> +from bzrlib.reconcile import topological_sort
>  from bzrlib.tests import TestCase
>  from bzrlib.tsort import topo_sort
>  from bzrlib.errors import GraphCycleError
> +
>  
>  class TopoSortTests(TestCase):
>  
> @@ -75,3 +74,53 @@
>                                      (7, [2, 3]),
>                                      (8, [0, 1, 4, 5, 6])]),
>                            [0, 1, 2, 3, 4, 5, 6, 7, 8, ])
> +
> +    def test_new_tsort_empty(self):
> +        """TopoSort empty list"""
> +        self.assertEquals(topological_sort([]), [])
> +
> +    def test_new_tsort_easy(self):
> +        """TopoSort list with one node"""
> +        self.assertEquals(topological_sort({0: []}.items()),
> +                          [0])
> +
> +    def test_new_tsort_cycle(self):
> +        """TopoSort traps graph with cycles"""
> +        self.assertRaises(GraphCycleError,
> +                          topological_sort,
> +                          {0: [1], 
> +                           1: [0]}.iteritems())
> +
> +    def test_new_tsort_cycle_2(self):
> +        """TopoSort traps graph with longer cycle"""
> +        self.assertRaises(GraphCycleError,
> +                          topological_sort,
> +                          {0: [1], 
> +                           1: [2], 
> +                           2: [0]}.iteritems())
> +                 
> +    def test_new_tsort_1(self):
> +        """TopoSort simple nontrivial graph"""
> +        self.assertEquals(topological_sort({0: [3], 
> +                                     1: [4],
> +                                     2: [1, 4],
> +                                     3: [], 
> +                                     4: [0, 3]}.iteritems()),
> +                          [3, 0, 4, 1, 2])
> +
> +    def test_new_tsort_partial(self):
> +        """Topological sort with partial ordering.
> +
> +        If the graph does not give an order between two nodes, they are 
> +        returned in lexicographical order.
> +        """
> +        self.assertEquals(topological_sort([(0, []),
> +                                    (1, [0]),
> +                                    (2, [0]),
> +                                    (3, [0]),
> +                                    (4, [1, 2, 3]),
> +                                    (5, [1, 2]),
> +                                    (6, [1, 2]),
> +                                    (7, [2, 3]),
> +                                    (8, [0, 1, 4, 5, 6])]),
> +                          [0, 1, 2, 3, 4, 5, 6, 7, 8, ])
> 

-- 
Martin