[MERGE] LCA Merge resolution for tree-shape
Aaron Bentley
aaron at aaronbentley.com
Thu Sep 4 04:54:22 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I haven't finished going over the test cases, but I've seen enough to
know there's work needed.
bb:resubmit
John Arbash Meinel wrote:
> I know 2k line patches are not fun to review. But if it makes you feel better,
> only about 600 lines are actual content changes, the rest are the tests making
> sure the content is correct. :)
The tests are, if anything, harder to read than the implementation.
> === modified file 'bzrlib/memorytree.py'
> --- bzrlib/memorytree.py 2008-07-22 18:12:25 +0000
> +++ bzrlib/memorytree.py 2008-07-23 02:35:09 +0000
> @@ -101,6 +103,14 @@
> return None, False, None
> return entry.kind, entry.executable, None
>
> + @needs_tree_write_lock
> + def rename_one(self, from_rel, to_rel):
> + file_id = self.path2id(from_rel)
> + to_dir, to_tail = os.path.split(to_rel)
> + to_parent_id = self.path2id(to_dir)
It seems a bit strange to implement parent_id calculation here. Since
you know you're working with an inventory, why not use
Inventory[file_id].parent?
Or you could maybe unify it with the parent calculation in
mutabletree._add_one_and_parent.
I was actually hoping that in the future, rename functionality would be
implemented in terms of apply_inventory_delta.
> === modified file 'bzrlib/merge.py'
> --- bzrlib/merge.py 2008-07-16 16:54:06 +0000
> +++ bzrlib/merge.py 2008-08-01 18:04:17 +0000
> @@ -28,6 +28,7 @@
> patiencediff,
> registry,
> revision as _mod_revision,
> + tree as _mod_tree,
> tsort,
> )
> from bzrlib.branch import Branch
> @@ -96,6 +97,8 @@
> self._revision_graph = revision_graph
> self._base_is_ancestor = None
> self._base_is_other_ancestor = None
> + self._is_criss_cross = False
^^^ None is a better default than False when no value has been
determined. It also matches the surrounding code better.
> + self._lca_trees = None
>
> @property
> def revision_graph(self):
> @@ -353,15 +356,43 @@
> ensure_null(self.other_basis)]
> if NULL_REVISION in revisions:
> self.base_rev_id = NULL_REVISION
> + self.base_tree = self.revision_tree(self.base_rev_id)
> else:
> - self.base_rev_id, steps = self.revision_graph.find_unique_lca(
> - revisions[0], revisions[1], count_steps=True)
> + lcas = self.revision_graph.find_lca(revisions[0], revisions[1])
> + if len(lcas) == 0:
> + self.base_rev_id = NULL_REVISION
> + elif len(lcas) == 1:
> + self.base_rev_id = list(lcas)[0]
> + else: # len(lcas) > 1
^^^ What is this commented-out code for?
> + if len(lcas) > 2:
> + # find_unique_lca can only handle 2 nodes, so we have to
> + # start back at the beginning. It is a shame to traverse
> + # the graph again, but better than re-implementing
> + # find_unique_lca.
I think you've identified an API failure. We should fix it, and I don't
think it would be hard.
One option is to update the Graph API so that Graph.find_unique_lca
takes an unlimited number of *args. AFAIK, there's only one real
implementation of Graph.
If you don't want to change the Graph API, another option is to
implement graph.find_unique_lca at the module level.
Graph.find_unique_lca could then be implemented in terms of
graph.find_unique_lca.
> + self.base_rev_id = self.revision_graph.find_unique_lca(
> + revisions[0], revisions[1])
> + else:
> + self.base_rev_id = self.revision_graph.find_unique_lca(
> + *lcas)
> + self._is_criss_cross = True
> if self.base_rev_id == NULL_REVISION:
> raise UnrelatedBranches()
> - if steps > 1:
> + if self._is_criss_cross:
> warning('Warning: criss-cross merge encountered. See bzr'
> ' help criss-cross.')
> - self.base_tree = self.revision_tree(self.base_rev_id)
> + interesting_revision_ids = [self.base_rev_id]
> + interesting_revision_ids.extend(lcas)
> + interesting_trees = dict((t.get_revision_id(), t)
> + for t in self.this_branch.repository.revision_trees(
> + interesting_revision_ids))
> + self._cached_trees.update(interesting_trees)
> + self.base_tree = interesting_trees.pop(self.base_rev_id)
> + sorted_lca_keys = self.revision_graph.find_merge_order(
> + revisions[0], lcas)
> + self._lca_trees = [interesting_trees[key]
> + for key in sorted_lca_keys]
It seems like we need to implement Merger.revision_trees (or at least a
private method):
1. touching _cached_trees in find_base is a layering issue, and
duplicates code
2. failing to use Tree.revision_tree means that we cannot use the
optimizations in DirStateRevisionTree.
3. it means fewer special cases in find_base
> + else:
> + self.base_tree = self.revision_tree(self.base_rev_id)
> self.base_is_ancestor = True
> self.base_is_other_ancestor = True
>
> @@ -550,19 +611,24 @@
> return self.tt
>
> def _compute_transform(self):
> - entries = self._entries3()
> + if self._lca_trees is None:
> + entries = self._entries3()
> + resolver = self._three_way
> + else:
> + entries = self._entries_lca()
> + resolver = self._lca_multi_way
> child_pb = ui.ui_factory.nested_progress_bar()
> try:
> for num, (file_id, changed, parents3, names3,
> executable3) in enumerate(entries):
^^^ I think these variable names should be changed.
> child_pb.update('Preparing file merge', num, len(entries))
> - self._merge_names(file_id, parents3, names3)
> + self._merge_names(file_id, parents3, names3, resolver=resolver)
> if changed:
> file_status = self.merge_contents(file_id)
> else:
> file_status = 'unmodified'
> self._merge_executable(file_id,
> - executable3, file_status)
> + executable3, file_status, resolver=resolver)
> finally:
> child_pb.finished()
> self.fix_root()
> @@ -614,6 +680,178 @@
> result.append((file_id, changed, parents3, names3, executable3))
> return result
>
> + def _entries_lca(self):
> + """Gather data about files modified between multiple trees.
> +
> + This compares OTHER versus all LCA trees, and for interesting entries,
> + it then compares with THIS and BASE.
> +
> + For the multi-valued entries, the format will be (BASE, [lca1, lca2])
This seems inaccurate. The value is actually (BASE, [lca1, lca2]),
OTHER, THIS
But I don't think it should be. BASE, LCAS, OTHER, THIS makes a lot
more sense to me. I don't see a need to join the BASE with the LCAs in
a tuple.
> + :return: [(file_id, changed, parents, names, executable)]
> + file_id Simple file_id of the entry
> + changed Boolean, True if the kind or contents changed
> + else False
> + parents ((base, [parent_id, in, lcas]), parent_id_other,
> + parent_id_this)
> + names ((base, [name, in, lcas]), name_in_other, name_in_this)
> + executable ((base, [exec, in, lcas]), exec_in_other, exec_in_this)
> + """
> + if self.interesting_files is not None:
> + lookup_trees = [self.this_tree, self.base_tree]
> + lookup_trees.extend(self._lca_trees)
> + # I think we should include the lca trees as well
> + interesting_ids = self.other_tree.paths2ids(self.interesting_files,
> + lookup_trees)
Filtering for interesting entries seems like it should be done in
MultiWalker.
> + else:
> + interesting_ids = self.interesting_ids
> + result = []
> + # XXX: Do we want a better sort order than this?
> + walker = _mod_tree.MultiWalker(self.other_tree, self._lca_trees)
parent-to-child order will help if the merge introduces a lot of new
files inside new directories, because it TT will attempt to place files
inside parents if the parents are already in limbo.
> + base_inventory = self.base_tree.inventory
> + this_inventory = self.this_tree.inventory
> + for path, file_id, other_ie, lca_values in walker.iter_all():
> + # Is this modified at all from any of the other trees?
> + if other_ie is None:
> + other_ie = _none_entry
> + if interesting_ids is not None and file_id not in interesting_ids:
> + continue
> +
> + # If other_revision is found in any of the lcas, that means this
> + # node is uninteresting. This is because when merging, if there are
> + # multiple heads(), we have to create a new node. So if we didn't,
> + # we know that the ancestry is linear, and that OTHER did not
> + # modify anything
> + # See doc/developers/lca_merge_resolution.txt for details
> + other_revision = other_ie.revision
> + if other_revision is not None:
> + # We can't use this shortcut when other_revision is None,
> + # because it may be None because things are WorkingTrees, and
> + # not because it is *actually* None.
^^^ When would it *actually* be none unless the trees were workingtrees?
It is required to have a value for RevisionTrees. _none_entry just
makes this logic more confused.
> + is_unmodified = False
> + for lca_path, ie in lca_values:
> + if ie is not None and ie.revision == other_revision:
> + is_unmodified = True
> + break
> + if is_unmodified:
> + continue
> +
> + lca_entries = []
> + for lca_path, lca_ie in lca_values:
> + if lca_ie is None:
> + lca_entries.append(_none_entry)
> + else:
> + lca_entries.append(lca_ie)
> +
> + if file_id in base_inventory:
> + base_ie = base_inventory[file_id]
> + else:
> + base_ie = _none_entry
> +
> + if file_id in this_inventory:
> + this_ie = this_inventory[file_id]
> + else:
> + this_ie = _none_entry
> +
> + lca_kinds = []
> + lca_parent_ids = []
> + lca_names = []
> + lca_executable = []
> + for lca_ie in lca_entries:
> + lca_kinds.append(lca_ie.kind)
> + lca_parent_ids.append(lca_ie.parent_id)
> + lca_names.append(lca_ie.name)
> + lca_executable.append(lca_ie.executable)
> +
> + kind_winner = Merge3Merger._lca_multi_way(
> + (base_ie.kind, lca_kinds),
> + other_ie.kind, this_ie.kind)
Why are we doing this? That sort of thing should be done in merge_contents.
When invoking methods on your own class, I think it's best to do it in a
generic way, such that subclasses can override those methods.
So I'd rather see self._lca_multi_way or perhaps
self.__class__._lca_multi_way. This is the way existing code is written.
> @@ -744,14 +1032,16 @@
> parents.append(entry.parent_id)
> return self._merge_names(file_id, parents, names)
>
> - def _merge_names(self, file_id, parents, names):
> + def _merge_names(self, file_id, parents, names, resolver=None):
I think resolver should become a mandatory parameter. It's a private
method, after all.
> === modified file 'bzrlib/tests/test_merge.py'
> --- bzrlib/tests/test_merge.py 2008-07-16 16:59:32 +0000
> +++ bzrlib/tests/test_merge.py 2008-08-01 16:40:01 +0000
> @@ -1089,3 +1091,1635 @@
> def test_merge_move_and_change(self):
> self.expectFailure("lca merge doesn't conflict for move and change",
> super(TestLCAMerge, self).test_merge_move_and_change)
> +
> +
> +class LoggingMerger(object):
> + # These seem to be the required attributes
> + requires_base = False
> + supports_reprocess = False
> + supports_show_base = False
> + supports_cherrypick = False
> + # We intentionally do not define supports_lca_trees
> +
> + def __init__(self, *args, **kwargs):
> + self.args = args
> + self.kwargs = kwargs
> +
> +
> +class TestMergerBase(TestCaseWithMemoryTransport):
> + """Tests for Merger that can be done without hitting disk."""
This description doesn't seem accurate. The class has no tests.
> +
> + def get_builder(self):
> + builder = self.make_branch_builder('path')
> + builder.start_series()
> + self.addCleanup(builder.finish_series)
> + return builder
> +
> + def setup_simple_graph(self):
> + """Create a simple 3-node graph.
> + A
> + |\\
> + B C
Do you mean to have two slashes here? Or did you mean
A
|\
B C
> +
> + :return: A BranchBuilder
> + """
> + builder = self.get_builder()
> + builder.build_snapshot('A-id', None,
> + [('add', ('', None, 'directory', None))])
> + builder.build_snapshot('C-id', ['A-id'], [])
> + builder.build_snapshot('B-id', ['A-id'], [])
> + return builder
> +
> + def setup_criss_cross_graph(self):
> + """Create a 5-node graph with a criss-cross.
> + A
> + |\\
> + B C
> + |X|
> + D E
Double-slash again...
> + :return: A BranchBuilder
> + """
> + builder = self.setup_simple_graph()
> + builder.build_snapshot('E-id', ['C-id', 'B-id'], [])
> + builder.build_snapshot('D-id', ['B-id', 'C-id'], [])
> + return builder
> +
> + def make_Merger(self, builder, other_revision_id,
> + interesting_files=None, interesting_ids=None):
> + """Make a Merger object from a branch builder"""
> + mem_tree = memorytree.MemoryTree.create_on_branch(builder.get_branch())
> + mem_tree.lock_write()
> + self.addCleanup(mem_tree.unlock)
> + merger = _mod_merge.Merger.from_revision_ids(progress.DummyProgress(),
> + mem_tree, other_revision_id)
> + if interesting_files is not None:
> + merger.set_interesting_files(interesting_files)
You don't need this conditional; set_interesting_files(None) will have
no effect.
> + if interesting_ids is not None:
> + # It seems there is no matching function for set_interesting_ids
> + merger.interesting_ids = interesting_ids
Again, no conditional. (Not sure why interesting_ids isn't a function,
though)
> + merger.merge_type = _mod_merge.Merge3Merger
> + return merger
> +
> +
> +class TestMergerInMemory(TestMergerBase):
> +
> + def test_find_base(self):
> + merger = self.make_Merger(self.setup_simple_graph(), 'C-id')
> + self.assertEqual('A-id', merger.base_rev_id)
> + self.assertFalse(merger._is_criss_cross)
> + self.assertIs(None, merger._lca_trees)
> +
> + def test_find_base_criss_cross(self):
> + builder = self.setup_criss_cross_graph()
> + merger = self.make_Merger(builder, 'E-id')
> + self.assertEqual('A-id', merger.base_rev_id)
> + self.assertTrue(merger._is_criss_cross)
> + self.assertEqual(['B-id', 'C-id'], [t.get_revision_id()
> + for t in merger._lca_trees])
> + # If we swap the order, we should get a different lca order
Is the LCA order significant? It seems like it shouldn't be.
> + builder.build_snapshot('F-id', ['E-id'], [])
> + merger = self.make_Merger(builder, 'D-id')
> + self.assertEqual(['C-id', 'B-id'], [t.get_revision_id()
> + for t in merger._lca_trees])
> +
> + def test_find_base_triple_criss_cross(self):
> + # A-.
> + # / \ \
> + # B C F # F is merged into both branches
> + # |\ /| |
> + # | X | |\
> + # |/ \| | :
> + # : D E |
> + # \| |/
> + # G H
> + builder = self.setup_criss_cross_graph()
> + builder.build_snapshot('F-id', ['A-id'], [])
> + builder.build_snapshot('H-id', ['E-id', 'F-id'], [])
> + builder.build_snapshot('G-id', ['D-id', 'F-id'], [])
> + merger = self.make_Merger(builder, 'H-id')
> + self.assertEqual(['B-id', 'C-id', 'F-id'],
> + [t.get_revision_id() for t in merger._lca_trees])
> +
> + def test_no_criss_cross_passed_to_merge_type(self):
> + class LCATreesMerger(LoggingMerger):
> + supports_lca_trees = True
> +
> + merger = self.make_Merger(self.setup_simple_graph(), 'C-id')
> + merger.merge_type = LCATreesMerger
> + merge_obj = merger.make_merger()
> + self.assertIsInstance(merge_obj, LCATreesMerger)
> + self.assertFalse('lca_trees' in merge_obj.kwargs)
> +
> + def test_criss_cross_passed_to_merge_type(self):
> + merger = self.make_Merger(self.setup_criss_cross_graph(), 'E-id')
> + merger.merge_type = _mod_merge.Merge3Merger
> + merge_obj = merger.make_merger()
> + self.assertEqual(['B-id', 'C-id'], [t.get_revision_id()
> + for t in merger._lca_trees])
> +
> + def test_criss_cross_not_supported_merge_type(self):
> + class NoLCATreesMerger(LoggingMerger):
> + # We intentionally do not define supports_lca_trees
> + pass
Can't you just use LoggingMerger here instead of defining a new class?
> +
> + merger = self.make_Merger(self.setup_criss_cross_graph(), 'E-id')
> + merger.merge_type = NoLCATreesMerger
> + merge_obj = merger.make_merger()
> + self.assertIsInstance(merge_obj, NoLCATreesMerger)
> + self.assertFalse('lca_trees' in merge_obj.kwargs)
> +
> + def test_criss_cross_unsupported_merge_type(self):
> + class UnsupportedLCATreesMerger(LoggingMerger):
> + supports_lca_trees = False
> +
> + merger = self.make_Merger(self.setup_criss_cross_graph(), 'E-id')
> + merger.merge_type = UnsupportedLCATreesMerger
> + merge_obj = merger.make_merger()
> + self.assertIsInstance(merge_obj, UnsupportedLCATreesMerger)
> + self.assertFalse('lca_trees' in merge_obj.kwargs)
> +
> +
> +class TestMergerEntriesLCA(TestMergerBase):
> +
> + def make_merge_obj(self, builder, other_revision_id,
> + interesting_files=None, interesting_ids=None):
> + merger = self.make_Merger(builder, other_revision_id,
> + interesting_files=interesting_files,
> + interesting_ids=interesting_ids)
> + return merger.make_merger()
> +
> + def test_simple(self):
> + builder = self.get_builder()
> + builder.build_snapshot('A-id', None,
> + [('add', (u'', 'a-root-id', 'directory', None)),
> + ('add', (u'a', 'a-id', 'file', 'a\nb\nc\n'))])
> + builder.build_snapshot('C-id', ['A-id'],
> + [('modify', ('a-id', 'a\nb\nC\nc\n'))])
> + builder.build_snapshot('B-id', ['A-id'],
> + [('modify', ('a-id', 'a\nB\nb\nc\n'))])
> + builder.build_snapshot('E-id', ['C-id', 'B-id'],
> + [('modify', ('a-id', 'a\nB\nb\nC\nc\nE\n'))])
> + builder.build_snapshot('D-id', ['B-id', 'C-id'],
> + [('modify', ('a-id', 'a\nB\nb\nC\nc\n'))])
> + merge_obj = self.make_merge_obj(builder, 'E-id')
It seems like you could profitably make some helper functions to
construct these snapshots. Do you nee any more data than
new_snapshot('modify', 'B-id', ['A-id'])
?
Also, it's weird to build D-id after E-id. I realize you're doing it
because this causes D-id to be selected by make_merge_object, but could
you swap the names?
> + self.assertEqual(['B-id', 'C-id'], [t.get_revision_id()
> + for t in merge_obj._lca_trees])
> + self.assertEqual('A-id', merge_obj.base_tree.get_revision_id())
> + entries = list(merge_obj._entries_lca())
> +
> + # (file_id, changed, parents, names, executable)
> + # BASE, lca1, lca2, OTHER, THIS
> + root_id = 'a-root-id'
> + self.assertEqual([('a-id', True,
> + ((root_id, [root_id, root_id]), root_id, root_id),
> + ((u'a', [u'a', u'a']), u'a', u'a'),
> + ((False, [False, False]), False, False)),
> + ], entries)
> +
> + def test_not_in_base(self):
> + # LCA's all have the same last-modified revision for the file, as do
Plural should be "LCAs"
> + # the tips, but the base has something different
> + # A base, doesn't have the file
> + # |\
> + # B C B introduces 'foo', C introduces 'bar'
> + # |X|
> + # D E D and E now both have 'foo' and 'bar'
> + # |X|
> + # F G the files are now in F, G, D and E, but not in A
> + # G modifies 'bar'
> +
> + builder = self.get_builder()
> + builder.build_snapshot('A-id', None,
> + [('add', (u'', 'a-root-id', 'directory', None))])
> + builder.build_snapshot('B-id', ['A-id'],
> + [('add', (u'foo', 'foo-id', 'file', 'a\nb\nc\n'))])
> + builder.build_snapshot('C-id', ['A-id'],
> + [('add', (u'bar', 'bar-id', 'file', 'd\ne\nf\n'))])
> + builder.build_snapshot('D-id', ['B-id', 'C-id'],
> + [('add', (u'bar', 'bar-id', 'file', 'd\ne\nf\n'))])
Why is this an add, not modify? The file exists in C-id, and D-id is
derived from it.
> + def test_file_not_in_one_lca(self):
> + builder = self.get_builder()
> + builder.build_snapshot('A-id', None,
> + [('add', (u'', 'a-root-id', 'directory', None))])
> + builder.build_snapshot('B-id', ['A-id'], [])
> + builder.build_snapshot('C-id', ['A-id'],
> + [('add', (u'a', 'a-id', 'file', 'a\nb\nc\n'))])
> + builder.build_snapshot('E-id', ['C-id', 'B-id'], []) # Inherited from C
> + builder.build_snapshot('D-id', ['B-id', 'C-id'], # Merged from C
> + [('add', (u'a', 'a-id', 'file', 'a\nb\nc\n'))])
> + merge_obj = self.make_merge_obj(builder, 'E-id')
> +
> + self.assertEqual(['B-id', 'C-id'], [t.get_revision_id()
> + for t in merge_obj._lca_trees])
> + self.assertEqual('A-id', merge_obj.base_tree.get_revision_id())
> +
> + entries = list(merge_obj._entries_lca())
> + root_id = 'a-root-id'
> + self.assertEqual([], entries)
I'm pretty sure this counts as a changed entry. Why are we not emitting
something?
> + def test_one_lca_supersedes(self):
> + # One LCA supersede's the other LCA's last modified value, but the
No apostrophe in "supersedes".
> + # value is not the same as BASE.
> + # A base, introduces 'foo', last mod A
> + # |\
> + # B C B modifies 'foo' (mod B), C does nothing (mod A)
> + # |X|
> + # D E D does nothing (mod B), E updates 'foo' (mod E)
> + # |X|
> + # F G F updates 'foo' (mod F). G does nothing (mod E)
> + #
> + # At this point, G should not be considered to modify 'foo', even
> + # though its LCAs disagree. This is because the modification in E
> + # completely supersedes the value in D.
> + builder = self.get_builder()
> + builder.build_snapshot('A-id', None,
> + [('add', (u'', 'a-root-id', 'directory', None)),
> + ('add', (u'foo', 'foo-id', 'file', 'A content\n'))])
> + builder.build_snapshot('C-id', ['A-id'], [])
> + builder.build_snapshot('B-id', ['A-id'],
> + [('modify', ('foo-id', 'B content\n'))])
> + builder.build_snapshot('D-id', ['B-id', 'C-id'], [])
> + builder.build_snapshot('E-id', ['C-id', 'B-id'],
> + [('modify', ('foo-id', 'E content\n'))])
> + builder.build_snapshot('G-id', ['E-id', 'D-id'], [])
> + builder.build_snapshot('F-id', ['D-id', 'E-id'],
> + [('modify', ('foo-id', 'F content\n'))])
> + merge_obj = self.make_merge_obj(builder, 'G-id')
> +
> + self.assertEqual([], list(merge_obj._entries_lca()))
This is mind-melting. Can we please simplify _entries_lca so that it
emits *everything* except entries that are the same in THIS and OTHER?
Trying to keep track of when it defers merge resolution to merge_names
vs resolving the merge itself is not easy.
> === added file 'doc/developers/lca_merge_resolution.txt'
> --- doc/developers/lca_merge_resolution.txt 1970-01-01 00:00:00 +0000
> +++ doc/developers/lca_merge_resolution.txt 2008-07-31 19:12:59 +0000
> @@ -0,0 +1,127 @@
> +====================
> +LCA Merge Resolution
> +====================
Probably better to call it "LCA Tree merging" or "LCA scalar merging",
just to reduce confusion further
> +
> +There are 2 ways that you get LCA merge resolution in bzr. First, if you use
> +``bzr merge --lca``, the content of files will be resolved using a Least Common
> +Ancestors algorithm. That is not being described here.
Perhaps a link to the spec for LCA text merging?
> +This is describing how we handle merging tree-shape when there is not a single
Perhaps "This describes"?
> +unique ancestor (criss-cross merge). With a single LCA, we use simple
> +3-way-merge logic.
> +
> +When there are multiple possible LCAs, we use a different algorithm for
> +handling tree-shape merging. Described here.
> +
> +As a simple example, here is a revision graph which we will refer to often::
> +
> + . BASE
> + . / \
> + . LCA1 LCA2
> + . | \ / |
> + . | X |
> + . | / \ |
> + . THIS OTHER
> +
> +In this graph, ``THIS`` and ``OTHER`` both have ``LCA1`` and ``LCA2`` in their
> +ancestry but neither is an ancestor of the other, so we have 2 least common
> +ancestors. The unique common ancestor is ``BASE``. (It should be noted that in
> +this text we will talk directly about ``LCA1`` and ``LCA2``, but the algorithms
> +are designed to cope with more than 2 LCAs.)
> +
> +
> +Scalars
> +=======
> +
> +Definition
> +----------
> +
> +I'm defining scalar values as ones that cannot be 'merged' on their own. For
> +example, the name of a file is "scalar". If one person changes "foo.txt" to
> +"foo.c" and someone else changes "foo.txt" to "bar.txt" we don't merge the
> +changes to be "bar.c", we simply conflict and expect the user to sort it out.
> +
> +We use a slightly different algorithm for scalars.
> +
> +
> +Resolution Algorithm
> +--------------------
> +
> +(This can be seen as ``bzrlib.merge.Merge3Merger._lca_multi_way```
> +
> +1. If ``THIS`` and ``OTHER`` have the same value, use it. There is no need to
> + inspect any other values in this case. Either nothing was changed (all
> + interesting nodes would have the same value), or we have "accidental
> + convergence" (both sides made the same change.).
> +
> +2. Find the values from ``LCA1`` and ``LCA2`` which are not the same as
> + ``BASE``. The idea here is to provide a rudimentary "heads" comparison.
> + Often, the whole tree graph will have a criss-cross, but the per-file
> + (per-scalar) graph would be linear. And the value in one LCA strictly
> + dominates the other. It is possible to construct a scenario where one side
> + dominates the other, but the dominated value is not ``BASE``, but a second
> + intermediate value. Most scalars are rarely changed, so this is unlikely to
> + be an issue. And the trade-off is having to generate and inspect the
> + per-scalar graph.
Try to avoid starting sentences with "And".
> + If there are no LCA values that are different from ``BASE``, we use a simple
> + 3-way merge with ``BASE`` as the base value.
> +
> +3. Find the unique set of LCA values that do not include the ``BASE`` value.
> + If there is only one unique LCA value, we again use three-way merge logic
> + using that unique value as the base.
> +
> +4. At this point we have determined that we have at least 2 unique values in
Comma after point
> + our LCAs which means that ``THIS`` and ``OTHER`` would both have to resolve
> + the conflict. If they resolved it in the same way, we would have caught that
> + in step 1. So they either both picked a different LCA value, or one (or
> + both) chose a new value to use.
> +
> + So at this point, if ``OTHER`` and ``THIS`` both picked a different LCA
> + value, we conflict.
> +
> + If ``OTHER`` and ``THIS`` both have values that are not LCA values, we also
> + conflict. (Same as 3-way, both sides modified a value in different ways.)
> +
> +5. (optional) The only tricky part is this, if ``OTHER`` has a LCA value, but
Comma after "this" should be a colon.
> + ``THIS`` does not, then we go with ``THIS``, and conversely if ``THIS`` has
> + an LCA value, but ``OTHER`` does not, then we go with ``OTHER``. The idea is
> + that ``THIS`` and ``OTHER`` may have resolved things in the same way, and
> + then later changed the value to something newer. (They could have also
> + resolved it differently, and then one side updated again.)
Anyhow, I think this describes a reasonable technique. I had always
assumed that we would need to change our inventory representation in
order to do any sort of history-aware inventory merging. My hat is off
to you for figuring out how to squeeze it out of the information we
already have.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIv1vt0F+nu1YWqI0RAtvAAJ9Yp1aNNG1scAhSm6nFMakdtPiXFgCffTFh
wDL4qsfEvTzqFwJmA5EbE+g=
=PW6x
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list