[MERGE] LCA Merge resolution for tree-shape

Aaron Bentley aaron at aaronbentley.com
Thu Sep 4 04:54:22 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I haven't finished going over the test cases, but I've seen enough to
know there's work needed.

bb:resubmit

John Arbash Meinel wrote:
> I know 2k line patches are not fun to review. But if it makes you feel better,
> only about 600 lines are actual content changes, the rest are the tests making
> sure the content is correct. :)

The tests are, if anything, harder to read than the implementation.

> === modified file 'bzrlib/memorytree.py'
> --- bzrlib/memorytree.py	2008-07-22 18:12:25 +0000
> +++ bzrlib/memorytree.py	2008-07-23 02:35:09 +0000
> @@ -101,6 +103,14 @@
>              return None, False, None
>          return entry.kind, entry.executable, None
>  
> +    @needs_tree_write_lock
> +    def rename_one(self, from_rel, to_rel):
> +        file_id = self.path2id(from_rel)
> +        to_dir, to_tail = os.path.split(to_rel)
> +        to_parent_id = self.path2id(to_dir)

It seems a bit strange to implement parent_id calculation here.  Since
you know you're working with an inventory, why not use
Inventory[file_id].parent?

Or you could maybe unify it with the parent calculation in
mutabletree._add_one_and_parent.

I was actually hoping that in the future, rename functionality would be
implemented in terms of apply_inventory_delta.

> === modified file 'bzrlib/merge.py'
> --- bzrlib/merge.py	2008-07-16 16:54:06 +0000
> +++ bzrlib/merge.py	2008-08-01 18:04:17 +0000
> @@ -28,6 +28,7 @@
>      patiencediff,
>      registry,
>      revision as _mod_revision,
> +    tree as _mod_tree,
>      tsort,
>      )
>  from bzrlib.branch import Branch
> @@ -96,6 +97,8 @@
>          self._revision_graph = revision_graph
>          self._base_is_ancestor = None
>          self._base_is_other_ancestor = None
> +        self._is_criss_cross = False

^^^ None is a better default than False when no value has been
determined.  It also matches the surrounding code better.

> +        self._lca_trees = None
>  
>      @property
>      def revision_graph(self):
> @@ -353,15 +356,43 @@
>                       ensure_null(self.other_basis)]
>          if NULL_REVISION in revisions:
>              self.base_rev_id = NULL_REVISION
> +            self.base_tree = self.revision_tree(self.base_rev_id)
>          else:
> -            self.base_rev_id, steps = self.revision_graph.find_unique_lca(
> -                revisions[0], revisions[1], count_steps=True)
> +            lcas = self.revision_graph.find_lca(revisions[0], revisions[1])
> +            if len(lcas) == 0:
> +                self.base_rev_id = NULL_REVISION
> +            elif len(lcas) == 1:
> +                self.base_rev_id = list(lcas)[0]
> +            else: # len(lcas) > 1

^^^ What is this commented-out code for?

> +                if len(lcas) > 2:
> +                    # find_unique_lca can only handle 2 nodes, so we have to
> +                    # start back at the beginning. It is a shame to traverse
> +                    # the graph again, but better than re-implementing
> +                    # find_unique_lca.

I think you've identified an API failure.  We should fix it, and I don't
think it would be hard.

One option is to update the Graph API so that Graph.find_unique_lca
takes an unlimited number of *args.  AFAIK, there's only one real
implementation of Graph.

If you don't want to change the Graph API, another option is to
implement graph.find_unique_lca at the module level.
Graph.find_unique_lca could then be implemented in terms of
graph.find_unique_lca.

> +                    self.base_rev_id = self.revision_graph.find_unique_lca(
> +                                            revisions[0], revisions[1])
> +                else:
> +                    self.base_rev_id = self.revision_graph.find_unique_lca(
> +                                            *lcas)
> +                self._is_criss_cross = True
>              if self.base_rev_id == NULL_REVISION:
>                  raise UnrelatedBranches()
> -            if steps > 1:
> +            if self._is_criss_cross:
>                  warning('Warning: criss-cross merge encountered.  See bzr'
>                          ' help criss-cross.')
> -        self.base_tree = self.revision_tree(self.base_rev_id)
> +                interesting_revision_ids = [self.base_rev_id]
> +                interesting_revision_ids.extend(lcas)
> +                interesting_trees = dict((t.get_revision_id(), t)
> +                    for t in self.this_branch.repository.revision_trees(
> +                        interesting_revision_ids))
> +                self._cached_trees.update(interesting_trees)
> +                self.base_tree = interesting_trees.pop(self.base_rev_id)
> +                sorted_lca_keys = self.revision_graph.find_merge_order(
> +                    revisions[0], lcas)
> +                self._lca_trees = [interesting_trees[key]
> +                                   for key in sorted_lca_keys]

It seems like we need to implement Merger.revision_trees (or at least a
private method):

1. touching _cached_trees in find_base is a layering issue, and
duplicates code
2. failing to use Tree.revision_tree means that we cannot use the
optimizations in DirStateRevisionTree.
3. it means fewer special cases in find_base

> +            else:
> +                self.base_tree = self.revision_tree(self.base_rev_id)
>          self.base_is_ancestor = True
>          self.base_is_other_ancestor = True
>  
> @@ -550,19 +611,24 @@
>          return self.tt
>  
>      def _compute_transform(self):
> -        entries = self._entries3()
> +        if self._lca_trees is None:
> +            entries = self._entries3()
> +            resolver = self._three_way
> +        else:
> +            entries = self._entries_lca()
> +            resolver = self._lca_multi_way
>          child_pb = ui.ui_factory.nested_progress_bar()
>          try:
>              for num, (file_id, changed, parents3, names3,
>                        executable3) in enumerate(entries):

^^^ I think these variable names should be changed.

>                  child_pb.update('Preparing file merge', num, len(entries))
> -                self._merge_names(file_id, parents3, names3)
> +                self._merge_names(file_id, parents3, names3, resolver=resolver)
>                  if changed:
>                      file_status = self.merge_contents(file_id)
>                  else:
>                      file_status = 'unmodified'
>                  self._merge_executable(file_id,
> -                    executable3, file_status)
> +                    executable3, file_status, resolver=resolver)
>          finally:
>              child_pb.finished()
>          self.fix_root()
> @@ -614,6 +680,178 @@
>              result.append((file_id, changed, parents3, names3, executable3))
>          return result
>  
> +    def _entries_lca(self):
> +        """Gather data about files modified between multiple trees.
> +
> +        This compares OTHER versus all LCA trees, and for interesting entries,
> +        it then compares with THIS and BASE.
> +
> +        For the multi-valued entries, the format will be (BASE, [lca1, lca2])

This seems inaccurate.  The value is actually (BASE, [lca1, lca2]),
OTHER, THIS

But I don't think it should be.  BASE, LCAS, OTHER, THIS makes a lot
more sense to me.  I don't see a need to join the BASE with the LCAs in
a tuple.

> +        :return: [(file_id, changed, parents, names, executable)]
> +            file_id     Simple file_id of the entry
> +            changed     Boolean, True if the kind or contents changed
> +                        else False
> +            parents     ((base, [parent_id, in, lcas]), parent_id_other,
> +                         parent_id_this)
> +            names       ((base, [name, in, lcas]), name_in_other, name_in_this)
> +            executable  ((base, [exec, in, lcas]), exec_in_other, exec_in_this)
> +        """
> +        if self.interesting_files is not None:
> +            lookup_trees = [self.this_tree, self.base_tree]
> +            lookup_trees.extend(self._lca_trees)
> +            # I think we should include the lca trees as well
> +            interesting_ids = self.other_tree.paths2ids(self.interesting_files,
> +                                                        lookup_trees)

Filtering for interesting entries seems like it should be done in
MultiWalker.

> +        else:
> +            interesting_ids = self.interesting_ids
> +        result = []
> +        # XXX: Do we want a better sort order than this?
> +        walker = _mod_tree.MultiWalker(self.other_tree, self._lca_trees)

parent-to-child order will help if the merge introduces a lot of new
files inside new directories, because it TT will attempt to place files
inside parents if the parents are already in limbo.

> +        base_inventory = self.base_tree.inventory
> +        this_inventory = self.this_tree.inventory
> +        for path, file_id, other_ie, lca_values in walker.iter_all():
> +            # Is this modified at all from any of the other trees?
> +            if other_ie is None:
> +                other_ie = _none_entry
> +            if interesting_ids is not None and file_id not in interesting_ids:
> +                continue
> +
> +            # If other_revision is found in any of the lcas, that means this
> +            # node is uninteresting. This is because when merging, if there are
> +            # multiple heads(), we have to create a new node. So if we didn't,
> +            # we know that the ancestry is linear, and that OTHER did not
> +            # modify anything
> +            # See doc/developers/lca_merge_resolution.txt for details
> +            other_revision = other_ie.revision
> +            if other_revision is not None:
> +                # We can't use this shortcut when other_revision is None,
> +                # because it may be None because things are WorkingTrees, and
> +                # not because it is *actually* None.

^^^ When would it *actually* be none unless the trees were workingtrees?
 It is required to have a value for RevisionTrees.  _none_entry just
makes this logic more confused.

> +                is_unmodified = False
> +                for lca_path, ie in lca_values:
> +                    if ie is not None and ie.revision == other_revision:
> +                        is_unmodified = True
> +                        break
> +                if is_unmodified:
> +                    continue
> +
> +            lca_entries = []
> +            for lca_path, lca_ie in lca_values:
> +                if lca_ie is None:
> +                    lca_entries.append(_none_entry)
> +                else:
> +                    lca_entries.append(lca_ie)
> +
> +            if file_id in base_inventory:
> +                base_ie = base_inventory[file_id]
> +            else:
> +                base_ie = _none_entry
> +
> +            if file_id in this_inventory:
> +                this_ie = this_inventory[file_id]
> +            else:
> +                this_ie = _none_entry
> +
> +            lca_kinds = []
> +            lca_parent_ids = []
> +            lca_names = []
> +            lca_executable = []
> +            for lca_ie in lca_entries:
> +                lca_kinds.append(lca_ie.kind)
> +                lca_parent_ids.append(lca_ie.parent_id)
> +                lca_names.append(lca_ie.name)
> +                lca_executable.append(lca_ie.executable)
> +
> +            kind_winner = Merge3Merger._lca_multi_way(
> +                (base_ie.kind, lca_kinds),
> +                other_ie.kind, this_ie.kind)

Why are we doing this?  That sort of thing should be done in merge_contents.

When invoking methods on your own class, I think it's best to do it in a
generic way, such that subclasses can override those methods.

So I'd rather see self._lca_multi_way or perhaps
self.__class__._lca_multi_way.  This is the way existing code is written.

> @@ -744,14 +1032,16 @@
>                  parents.append(entry.parent_id)
>          return self._merge_names(file_id, parents, names)
>  
> -    def _merge_names(self, file_id, parents, names):
> +    def _merge_names(self, file_id, parents, names, resolver=None):

I think resolver should become a mandatory parameter.  It's a private
method, after all.

> === modified file 'bzrlib/tests/test_merge.py'
> --- bzrlib/tests/test_merge.py	2008-07-16 16:59:32 +0000
> +++ bzrlib/tests/test_merge.py	2008-08-01 16:40:01 +0000

> @@ -1089,3 +1091,1635 @@
>      def test_merge_move_and_change(self):
>          self.expectFailure("lca merge doesn't conflict for move and change",
>              super(TestLCAMerge, self).test_merge_move_and_change)
> +
> +
> +class LoggingMerger(object):
> +    # These seem to be the required attributes
> +    requires_base = False
> +    supports_reprocess = False
> +    supports_show_base = False
> +    supports_cherrypick = False
> +    # We intentionally do not define supports_lca_trees
> +
> +    def __init__(self, *args, **kwargs):
> +        self.args = args
> +        self.kwargs = kwargs
> +
> +
> +class TestMergerBase(TestCaseWithMemoryTransport):
> +    """Tests for Merger that can be done without hitting disk."""

This description doesn't seem accurate.  The class has no tests.

> +
> +    def get_builder(self):
> +        builder = self.make_branch_builder('path')
> +        builder.start_series()
> +        self.addCleanup(builder.finish_series)
> +        return builder
> +
> +    def setup_simple_graph(self):
> +        """Create a simple 3-node graph.
> +                A
> +                |\\
> +                B C

Do you mean to have two slashes here?  Or did you mean

                A
                |\
                B C

> +
> +        :return: A BranchBuilder
> +        """
> +        builder = self.get_builder()
> +        builder.build_snapshot('A-id', None,
> +            [('add', ('', None, 'directory', None))])
> +        builder.build_snapshot('C-id', ['A-id'], [])
> +        builder.build_snapshot('B-id', ['A-id'], [])
> +        return builder
> +
> +    def setup_criss_cross_graph(self):
> +        """Create a 5-node graph with a criss-cross.
> +                A
> +                |\\
> +                B C
> +                |X|
> +                D E

Double-slash again...


> +        :return: A BranchBuilder
> +        """
> +        builder = self.setup_simple_graph()
> +        builder.build_snapshot('E-id', ['C-id', 'B-id'], [])
> +        builder.build_snapshot('D-id', ['B-id', 'C-id'], [])
> +        return builder
> +
> +    def make_Merger(self, builder, other_revision_id,
> +                    interesting_files=None, interesting_ids=None):
> +        """Make a Merger object from a branch builder"""
> +        mem_tree = memorytree.MemoryTree.create_on_branch(builder.get_branch())
> +        mem_tree.lock_write()
> +        self.addCleanup(mem_tree.unlock)
> +        merger = _mod_merge.Merger.from_revision_ids(progress.DummyProgress(),
> +            mem_tree, other_revision_id)
> +        if interesting_files is not None:
> +            merger.set_interesting_files(interesting_files)

You don't need this conditional; set_interesting_files(None) will have
no effect.

> +        if interesting_ids is not None:
> +            # It seems there is no matching function for set_interesting_ids
> +            merger.interesting_ids = interesting_ids

Again, no conditional.  (Not sure why interesting_ids isn't a function,
though)

> +        merger.merge_type = _mod_merge.Merge3Merger
> +        return merger
> +
> +
> +class TestMergerInMemory(TestMergerBase):
> +
> +    def test_find_base(self):
> +        merger = self.make_Merger(self.setup_simple_graph(), 'C-id')
> +        self.assertEqual('A-id', merger.base_rev_id)
> +        self.assertFalse(merger._is_criss_cross)
> +        self.assertIs(None, merger._lca_trees)
> +
> +    def test_find_base_criss_cross(self):
> +        builder = self.setup_criss_cross_graph()
> +        merger = self.make_Merger(builder, 'E-id')
> +        self.assertEqual('A-id', merger.base_rev_id)
> +        self.assertTrue(merger._is_criss_cross)
> +        self.assertEqual(['B-id', 'C-id'], [t.get_revision_id()
> +                                            for t in merger._lca_trees])
> +        # If we swap the order, we should get a different lca order

Is the LCA order significant?  It seems like it shouldn't be.

> +        builder.build_snapshot('F-id', ['E-id'], [])
> +        merger = self.make_Merger(builder, 'D-id')
> +        self.assertEqual(['C-id', 'B-id'], [t.get_revision_id()
> +                                            for t in merger._lca_trees])
> +
> +    def test_find_base_triple_criss_cross(self):
> +        #       A-.
> +        #      / \ \
> +        #     B   C F # F is merged into both branches
> +        #     |\ /| |
> +        #     | X | |\
> +        #     |/ \| | :
> +        #   : D   E |
> +        #    \|   |/
> +        #     G   H
> +        builder = self.setup_criss_cross_graph()
> +        builder.build_snapshot('F-id', ['A-id'], [])
> +        builder.build_snapshot('H-id', ['E-id', 'F-id'], [])
> +        builder.build_snapshot('G-id', ['D-id', 'F-id'], [])
> +        merger = self.make_Merger(builder, 'H-id')
> +        self.assertEqual(['B-id', 'C-id', 'F-id'],
> +                         [t.get_revision_id() for t in merger._lca_trees])
> +
> +    def test_no_criss_cross_passed_to_merge_type(self):
> +        class LCATreesMerger(LoggingMerger):
> +            supports_lca_trees = True
> +
> +        merger = self.make_Merger(self.setup_simple_graph(), 'C-id')
> +        merger.merge_type = LCATreesMerger
> +        merge_obj = merger.make_merger()
> +        self.assertIsInstance(merge_obj, LCATreesMerger)
> +        self.assertFalse('lca_trees' in merge_obj.kwargs)
> +
> +    def test_criss_cross_passed_to_merge_type(self):
> +        merger = self.make_Merger(self.setup_criss_cross_graph(), 'E-id')
> +        merger.merge_type = _mod_merge.Merge3Merger
> +        merge_obj = merger.make_merger()
> +        self.assertEqual(['B-id', 'C-id'], [t.get_revision_id()
> +                                            for t in merger._lca_trees])
> +
> +    def test_criss_cross_not_supported_merge_type(self):
> +        class NoLCATreesMerger(LoggingMerger):
> +            # We intentionally do not define supports_lca_trees
> +            pass

Can't you just use LoggingMerger here instead of defining a new class?

> +
> +        merger = self.make_Merger(self.setup_criss_cross_graph(), 'E-id')
> +        merger.merge_type = NoLCATreesMerger
> +        merge_obj = merger.make_merger()
> +        self.assertIsInstance(merge_obj, NoLCATreesMerger)
> +        self.assertFalse('lca_trees' in merge_obj.kwargs)
> +
> +    def test_criss_cross_unsupported_merge_type(self):
> +        class UnsupportedLCATreesMerger(LoggingMerger):
> +            supports_lca_trees = False
> +
> +        merger = self.make_Merger(self.setup_criss_cross_graph(), 'E-id')
> +        merger.merge_type = UnsupportedLCATreesMerger
> +        merge_obj = merger.make_merger()
> +        self.assertIsInstance(merge_obj, UnsupportedLCATreesMerger)
> +        self.assertFalse('lca_trees' in merge_obj.kwargs)
> +
> +
> +class TestMergerEntriesLCA(TestMergerBase):
> +
> +    def make_merge_obj(self, builder, other_revision_id,
> +                       interesting_files=None, interesting_ids=None):
> +        merger = self.make_Merger(builder, other_revision_id,
> +            interesting_files=interesting_files,
> +            interesting_ids=interesting_ids)
> +        return merger.make_merger()
> +
> +    def test_simple(self):
> +        builder = self.get_builder()
> +        builder.build_snapshot('A-id', None,
> +            [('add', (u'', 'a-root-id', 'directory', None)),
> +             ('add', (u'a', 'a-id', 'file', 'a\nb\nc\n'))])
> +        builder.build_snapshot('C-id', ['A-id'],
> +            [('modify', ('a-id', 'a\nb\nC\nc\n'))])
> +        builder.build_snapshot('B-id', ['A-id'],
> +            [('modify', ('a-id', 'a\nB\nb\nc\n'))])
> +        builder.build_snapshot('E-id', ['C-id', 'B-id'],
> +            [('modify', ('a-id', 'a\nB\nb\nC\nc\nE\n'))])
> +        builder.build_snapshot('D-id', ['B-id', 'C-id'],
> +            [('modify', ('a-id', 'a\nB\nb\nC\nc\n'))])
> +        merge_obj = self.make_merge_obj(builder, 'E-id')

It seems like you could profitably make some helper functions to
construct these snapshots.  Do you nee any more data than

  new_snapshot('modify', 'B-id', ['A-id'])

?

Also, it's weird to build D-id after E-id.  I realize you're doing it
because this causes D-id to be selected by make_merge_object, but could
you swap the names?

> +        self.assertEqual(['B-id', 'C-id'], [t.get_revision_id()
> +                                            for t in merge_obj._lca_trees])
> +        self.assertEqual('A-id', merge_obj.base_tree.get_revision_id())
> +        entries = list(merge_obj._entries_lca())
> +
> +        # (file_id, changed, parents, names, executable)
> +        # BASE, lca1, lca2, OTHER, THIS
> +        root_id = 'a-root-id'
> +        self.assertEqual([('a-id', True,
> +                           ((root_id, [root_id, root_id]), root_id, root_id),
> +                           ((u'a', [u'a', u'a']), u'a', u'a'),
> +                           ((False, [False, False]), False, False)),
> +                         ], entries)
> +
> +    def test_not_in_base(self):
> +        # LCA's all have the same last-modified revision for the file, as do

Plural should be "LCAs"

> +        # the tips, but the base has something different
> +        #       A    base, doesn't have the file
> +        #       |\
> +        #       B C  B introduces 'foo', C introduces 'bar'
> +        #       |X|
> +        #       D E  D and E now both have 'foo' and 'bar'
> +        #       |X|
> +        #       F G  the files are now in F, G, D and E, but not in A
> +        #            G modifies 'bar'
> +
> +        builder = self.get_builder()
> +        builder.build_snapshot('A-id', None,
> +            [('add', (u'', 'a-root-id', 'directory', None))])
> +        builder.build_snapshot('B-id', ['A-id'],
> +            [('add', (u'foo', 'foo-id', 'file', 'a\nb\nc\n'))])
> +        builder.build_snapshot('C-id', ['A-id'],
> +            [('add', (u'bar', 'bar-id', 'file', 'd\ne\nf\n'))])
> +        builder.build_snapshot('D-id', ['B-id', 'C-id'],
> +            [('add', (u'bar', 'bar-id', 'file', 'd\ne\nf\n'))])

Why is this an add, not modify?  The file exists in C-id, and D-id is
derived from it.

> +    def test_file_not_in_one_lca(self):
> +        builder = self.get_builder()
> +        builder.build_snapshot('A-id', None,
> +            [('add', (u'', 'a-root-id', 'directory', None))])
> +        builder.build_snapshot('B-id', ['A-id'], [])
> +        builder.build_snapshot('C-id', ['A-id'],
> +            [('add', (u'a', 'a-id', 'file', 'a\nb\nc\n'))])
> +        builder.build_snapshot('E-id', ['C-id', 'B-id'], []) # Inherited from C
> +        builder.build_snapshot('D-id', ['B-id', 'C-id'], # Merged from C
> +            [('add', (u'a', 'a-id', 'file', 'a\nb\nc\n'))])
> +        merge_obj = self.make_merge_obj(builder, 'E-id')
> +
> +        self.assertEqual(['B-id', 'C-id'], [t.get_revision_id()
> +                                            for t in merge_obj._lca_trees])
> +        self.assertEqual('A-id', merge_obj.base_tree.get_revision_id())
> +
> +        entries = list(merge_obj._entries_lca())
> +        root_id = 'a-root-id'
> +        self.assertEqual([], entries)

I'm pretty sure this counts as a changed entry.  Why are we not emitting
something?

> +    def test_one_lca_supersedes(self):
> +        # One LCA supersede's the other LCA's last modified value, but the

No apostrophe in "supersedes".

> +        # value is not the same as BASE.
> +        #       A    base, introduces 'foo', last mod A
> +        #       |\
> +        #       B C  B modifies 'foo' (mod B), C does nothing (mod A)
> +        #       |X|
> +        #       D E  D does nothing (mod B), E updates 'foo' (mod E)
> +        #       |X|
> +        #       F G  F updates 'foo' (mod F). G does nothing (mod E)
> +        #
> +        #   At this point, G should not be considered to modify 'foo', even
> +        #   though its LCAs disagree. This is because the modification in E
> +        #   completely supersedes the value in D.
> +        builder = self.get_builder()
> +        builder.build_snapshot('A-id', None,
> +            [('add', (u'', 'a-root-id', 'directory', None)),
> +             ('add', (u'foo', 'foo-id', 'file', 'A content\n'))])
> +        builder.build_snapshot('C-id', ['A-id'], [])
> +        builder.build_snapshot('B-id', ['A-id'],
> +            [('modify', ('foo-id', 'B content\n'))])
> +        builder.build_snapshot('D-id', ['B-id', 'C-id'], [])
> +        builder.build_snapshot('E-id', ['C-id', 'B-id'],
> +            [('modify', ('foo-id', 'E content\n'))])
> +        builder.build_snapshot('G-id', ['E-id', 'D-id'], [])
> +        builder.build_snapshot('F-id', ['D-id', 'E-id'],
> +            [('modify', ('foo-id', 'F content\n'))])
> +        merge_obj = self.make_merge_obj(builder, 'G-id')
> +
> +        self.assertEqual([], list(merge_obj._entries_lca()))

This is mind-melting.  Can we please simplify _entries_lca so that it
emits *everything* except entries that are the same in THIS and OTHER?
Trying to keep track of when it defers merge resolution to merge_names
vs resolving the merge itself is not easy.

> === added file 'doc/developers/lca_merge_resolution.txt'
> --- doc/developers/lca_merge_resolution.txt	1970-01-01 00:00:00 +0000
> +++ doc/developers/lca_merge_resolution.txt	2008-07-31 19:12:59 +0000
> @@ -0,0 +1,127 @@
> +====================
> +LCA Merge Resolution
> +====================

Probably better to call it "LCA Tree merging" or "LCA scalar merging",
just to reduce confusion further

> +
> +There are 2 ways that you get LCA merge resolution in bzr. First, if you use
> +``bzr merge --lca``, the content of files will be resolved using a Least Common
> +Ancestors algorithm. That is not being described here.

Perhaps a link to the spec for LCA text merging?

> +This is describing how we handle merging tree-shape when there is not a single

Perhaps "This describes"?

> +unique ancestor (criss-cross merge). With a single LCA, we use simple
> +3-way-merge logic.
> +
> +When there are multiple possible LCAs, we use a different algorithm for
> +handling tree-shape merging. Described here.
> +
> +As a simple example, here is a revision graph which we will refer to often::
> +
> +  .    BASE
> +  .  /      \
> +  . LCA1   LCA2
> +  . |   \ /   |
> +  . |    X    |
> +  . |   / \   |
> +  . THIS  OTHER
> +
> +In this graph, ``THIS`` and ``OTHER`` both have ``LCA1`` and ``LCA2`` in their
> +ancestry but neither is an ancestor of the other, so we have 2 least common
> +ancestors. The unique common ancestor is ``BASE``. (It should be noted that in
> +this text we will talk directly about ``LCA1`` and ``LCA2``, but the algorithms
> +are designed to cope with more than 2 LCAs.)
> +
> +
> +Scalars
> +=======
> +
> +Definition
> +----------
> +
> +I'm defining scalar values as ones that cannot be 'merged' on their own. For
> +example, the name of a file is "scalar". If one person changes "foo.txt" to
> +"foo.c" and someone else changes "foo.txt" to "bar.txt" we don't merge the
> +changes to be "bar.c", we simply conflict and expect the user to sort it out.
> +
> +We use a slightly different algorithm for scalars.
> +
> +
> +Resolution Algorithm
> +--------------------
> +
> +(This can be seen as ``bzrlib.merge.Merge3Merger._lca_multi_way```
> +
> +1. If ``THIS`` and ``OTHER`` have the same value, use it. There is no need to
> +   inspect any other values in this case. Either nothing was changed (all
> +   interesting nodes would have the same value), or we have "accidental
> +   convergence" (both sides made the same change.).
> +
> +2. Find the values from ``LCA1`` and ``LCA2`` which are not the same as
> +   ``BASE``. The idea here is to provide a rudimentary "heads" comparison.
> +   Often, the whole tree graph will have a criss-cross, but the per-file
> +   (per-scalar) graph would be linear. And the value in one LCA strictly
> +   dominates the other. It is possible to construct a scenario where one side
> +   dominates the other, but the dominated value is not ``BASE``, but a second
> +   intermediate value. Most scalars are rarely changed, so this is unlikely to
> +   be an issue. And the trade-off is having to generate and inspect the
> +   per-scalar graph.

Try to avoid starting sentences with "And".

> +   If there are no LCA values that are different from ``BASE``, we use a simple
> +   3-way merge with ``BASE`` as the base value.
> +
> +3. Find the unique set of LCA values that do not include the ``BASE`` value.
> +   If there is only one unique LCA value, we again use three-way merge logic
> +   using that unique value as the base.
> +
> +4. At this point we have determined that we have at least 2 unique values in

Comma after point

> +   our LCAs which means that ``THIS`` and ``OTHER`` would both have to resolve
> +   the conflict. If they resolved it in the same way, we would have caught that
> +   in step 1. So they either both picked a different LCA value, or one (or
> +   both) chose a new value to use.
> +
> +   So at this point, if ``OTHER`` and ``THIS`` both picked a different LCA
> +   value, we conflict.
> +
> +   If ``OTHER`` and ``THIS`` both have values that are not LCA values, we also
> +   conflict. (Same as 3-way, both sides modified a value in different ways.)
> +
> +5. (optional) The only tricky part is this, if ``OTHER`` has a LCA value, but

Comma after "this" should be a colon.

> +   ``THIS`` does not, then we go with ``THIS``, and conversely if ``THIS`` has
> +   an LCA value, but ``OTHER`` does not, then we go with ``OTHER``. The idea is
> +   that ``THIS`` and ``OTHER`` may have resolved things in the same way, and
> +   then later changed the value to something newer. (They could have also
> +   resolved it differently, and then one side updated again.)

Anyhow, I think this describes a reasonable technique.  I had always
assumed that we would need to change our inventory representation in
order to do any sort of history-aware inventory merging.  My hat is off
to you for figuring out how to squeeze it out of the information we
already have.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIv1vt0F+nu1YWqI0RAtvAAJ9Yp1aNNG1scAhSm6nFMakdtPiXFgCffTFh
wDL4qsfEvTzqFwJmA5EbE+g=
=PW6x
-----END PGP SIGNATURE-----



More information about the bazaar mailing list