Handling repeated text annotations

Thu Feb 14 20:10:38 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> John Arbash Meinel wrote:
> | Robert Collins wrote:
> | | On Wed, 2008-02-13 at 17:07 -0600, John Arbash Meinel wrote:
> | |
> | |> I'll try using heads() though I'm not sure it is exposed easily down
> | |> at this
> | |> level :(. I'm a little concerned that all the heads calls are going to
> | |> kill
> | |> performance. But I guess we only have to do it for lines that seem to
> | |> claim 2
> | |> parents.
> | |
> | | When you merge a line from another text, it should not get a new
> | | version, so when you have two branches join, no changes on either side
> | | will not be a merge.
> | |
> | | BUT - if one side changes, then you have two versions - the unchanged
> | | and the changed.
> | |
> | | So yes, heads() is indeed the key to not showing 'merges' where its
> | | really just one side doing edits.
> | |
> | | -Rob
> |
> | I also want to check the old knit logic. Because we might have used
> | "match all
> | lines to left, for things that don't match, check right". Which would
> | treat this
> | case as "left always wins" which is certainly easier to implement.
> |
> | John
> | =:->
> |
> 
> So the knit code is actually "right-most always wins". Specifically it has:
> 
> ~            for parent_id in parents:
>         ...
> ~                for i, j, n in seq.get_matching_blocks():
> ~                    if n == 0:
> ~                        continue
> ~                    # this appears to copy (origin, text) pairs across
> to the
> ~                    # new content for any line that matches the
> last-checked
> ~                    # parent.
> ~                    content._lines[j:j+n] = merge_content._lines[i:i+n]
> 
> So it does a diff against all parents, and then it replaces the lines in
> the
> output with any matches from that parent.
> 
> Now, there are some pretty strong performance characteristics for the
> various
> possibilities.
> 
> Current "always create a new node", aka tip-wins gives:
> $ time ~/dev/bzr/annotate_stuff/bzr annotate --show-ids sql/mysqld.cc
>>/dev/null
> time_in_reannotate_annotated: 51.760s
> Clean: 8363598
> empty:   77801
> new:      2667
> left:   243009
> right:  294698
> tip:   1385106
> 
> real    1m4.447s
> 
> 
> Left wins:
> time_in_reannotate_annotated: 11.070s
> Clean: 9727552
> empty:  144766
> new:        28
> left:   176544
> right:  294488
> tip:     23501
> 
> real    0m23.915s
> 
> Right wins:
> time_in_reannotate_annotated: 11.300s
> Clean: 9731252
> empty:  146006
> new:        42
> left:   176323
> right:  293926
> tip:     19330
> 
> real    0m23.723s
> 
> Resolve using heads(), the heads: line is entries that resolved to a
> single node
> after checking ancestry:
> time_in_reannotate_annotated: 18.780s
> Clean: 9673376
> empty:  133082
> new:       563
> left:   188215
> right:  294626
> head:    75830
> tip:      1187
> 
> real    0m32.211s
> 
> So it seems that left-wins and right-wins are approximately the same for
> performance. heads() adds about 50% overhead. This is using a HeadsCache
> and a
> DictParentsProvider because the annotate code was already building up
> the list
> of parents it needed to start annotating.
> 
> 
> 
> Is it worth the performance hit for correctness? I'm not sure, considering
> nobody seemed to be bothered by how we annotated with knits, and they
> used a
> "right-wins" policy.
> They are all better than "tip-wins", though some of that may be the test
> case.
> Though for repository.py the timing is:
> tip-wins:   3.5s
> right-wins: 3.2s
> heads:      4.4s
> 
> So I'm tempted to do "heads()" for correctness, but I'm not sure if we
> win much
> versus left-wins or right-wins. Isn't it more useful to see the log
> message of
> one of the nodes that created that line, rather than the node that
> merges the
> identical texts together?
> 
> I suppose we also need to consider the effects on annotate-merge. As
> right/left-wins might produce more conflicts than a heads() based solution.

We don't currently perform true annotate merges.  --knit just uses
newness information: "Was line x present in any ancestor, or was it
introduced by this revision?"

So if you want, I can think about it, but I don't think it's strictly
relevant.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHtKA+0F+nu1YWqI0RAnOJAJ0f0ITMKNOTe4ow9qhdWQKpx1i/AACfSvti
6fdqNWPzGVZwhRUry+9F2EI=
=veZz
-----END PGP SIGNATURE-----