Handling repeated text annotations
Aaron Bentley
aaron at aaronbentley.com
Thu Feb 14 20:10:38 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> John Arbash Meinel wrote:
> | Robert Collins wrote:
> | | On Wed, 2008-02-13 at 17:07 -0600, John Arbash Meinel wrote:
> | |
> | |> I'll try using heads() though I'm not sure it is exposed easily down
> | |> at this
> | |> level :(. I'm a little concerned that all the heads calls are going to
> | |> kill
> | |> performance. But I guess we only have to do it for lines that seem to
> | |> claim 2
> | |> parents.
> | |
> | | When you merge a line from another text, it should not get a new
> | | version, so when you have two branches join, no changes on either side
> | | will not be a merge.
> | |
> | | BUT - if one side changes, then you have two versions - the unchanged
> | | and the changed.
> | |
> | | So yes, heads() is indeed the key to not showing 'merges' where its
> | | really just one side doing edits.
> | |
> | | -Rob
> |
> | I also want to check the old knit logic. Because we might have used
> | "match all
> | lines to left, for things that don't match, check right". Which would
> | treat this
> | case as "left always wins" which is certainly easier to implement.
> |
> | John
> | =:->
> |
>
> So the knit code is actually "right-most always wins". Specifically it has:
>
> ~ for parent_id in parents:
> ...
> ~ for i, j, n in seq.get_matching_blocks():
> ~ if n == 0:
> ~ continue
> ~ # this appears to copy (origin, text) pairs across
> to the
> ~ # new content for any line that matches the
> last-checked
> ~ # parent.
> ~ content._lines[j:j+n] = merge_content._lines[i:i+n]
>
> So it does a diff against all parents, and then it replaces the lines in
> the
> output with any matches from that parent.
>
> Now, there are some pretty strong performance characteristics for the
> various
> possibilities.
>
> Current "always create a new node", aka tip-wins gives:
> $ time ~/dev/bzr/annotate_stuff/bzr annotate --show-ids sql/mysqld.cc
>>/dev/null
> time_in_reannotate_annotated: 51.760s
> Clean: 8363598
> empty: 77801
> new: 2667
> left: 243009
> right: 294698
> tip: 1385106
>
> real 1m4.447s
>
>
> Left wins:
> time_in_reannotate_annotated: 11.070s
> Clean: 9727552
> empty: 144766
> new: 28
> left: 176544
> right: 294488
> tip: 23501
>
> real 0m23.915s
>
> Right wins:
> time_in_reannotate_annotated: 11.300s
> Clean: 9731252
> empty: 146006
> new: 42
> left: 176323
> right: 293926
> tip: 19330
>
> real 0m23.723s
>
> Resolve using heads(), the heads: line is entries that resolved to a
> single node
> after checking ancestry:
> time_in_reannotate_annotated: 18.780s
> Clean: 9673376
> empty: 133082
> new: 563
> left: 188215
> right: 294626
> head: 75830
> tip: 1187
>
> real 0m32.211s
>
> So it seems that left-wins and right-wins are approximately the same for
> performance. heads() adds about 50% overhead. This is using a HeadsCache
> and a
> DictParentsProvider because the annotate code was already building up
> the list
> of parents it needed to start annotating.
>
>
>
> Is it worth the performance hit for correctness? I'm not sure, considering
> nobody seemed to be bothered by how we annotated with knits, and they
> used a
> "right-wins" policy.
> They are all better than "tip-wins", though some of that may be the test
> case.
> Though for repository.py the timing is:
> tip-wins: 3.5s
> right-wins: 3.2s
> heads: 4.4s
>
> So I'm tempted to do "heads()" for correctness, but I'm not sure if we
> win much
> versus left-wins or right-wins. Isn't it more useful to see the log
> message of
> one of the nodes that created that line, rather than the node that
> merges the
> identical texts together?
>
> I suppose we also need to consider the effects on annotate-merge. As
> right/left-wins might produce more conflicts than a heads() based solution.
We don't currently perform true annotate merges. --knit just uses
newness information: "Was line x present in any ancestor, or was it
introduced by this revision?"
So if you want, I can think about it, but I don't think it's strictly
relevant.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHtKA+0F+nu1YWqI0RAnOJAJ0f0ITMKNOTe4ow9qhdWQKpx1i/AACfSvti
6fdqNWPzGVZwhRUry+9F2EI=
=veZz
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list