[MERGE] annotation policies

Wed Jun 18 09:22:25 BST 2008

John Arbash Meinel wrote:
> Ian Clatworthy wrote:

> | Hmm. Is this docstring comment re None still correct now?
> 
> It now depends on the policy. Ultimately, I think I want to deprecate this
> function, as the callers should be using a policy.

We can but I wouldn't hurry. Having a high level wrapper like this
can be useful for developers just getting into the API.

> Further, I've considered having a flag to change the line-matching
> algorithm.
> Specifically, you could do stuff where you ignore whitespace changes, or
> you
> could use a (-whitespace, patiencediff, +whitespace), which does
> interesting
> things when you indent code.

Interesting.

> | I wonder if it's worth be more explicit about what a value of None means?
> |
> 
> I'm tempted to remove the "can be None". The idea was that not all policies
> provide use a heads() so why should callers be required to pass one. And
> some
> locations may not have it easily available.

I think your current approach is sound. I agree that needing to supply a
heads_provider when using a policy that doesn't need it is wasteful (and
annoying for API users).

> Policy dominates in the long-term. The more times you change a lines
> annotation,
> the more times you have to resolve that (with a heads check, etc.) I've
> tried
> hard to structure it so that *unchanged* lines get processed quickly, as
> that is
> always the place that will be the bottleneck. (In a 1000 line file,
> maybe you
> change 10 lines each time, or 1% of the lines are interesting.) The
> original
> code used to do a flat iteration over all lines, which seems fast
> because it can
> go linearly. Except it has to process the 99% of lines that didn't change.
> 
> I'll go ahead with the clearer api which just passes revision ids.

Cool.

> | I still need to review test_annotation_policy.py and read through your
> | emails on the trade-offs of the various algorithms before commenting on
> | my preferred default. I'm hoping to do that tomorrow.
> |
> | Ian C.
> |
> 
> The problem I haven't really worked out is that of the permutations.
> Specifically, you have an algorithm for resolving ambiguous lines, you
> have an
> algorithm for resolving the bulk of the lines. You have different storage
> architectures that have different abilities. Some may prefer extracting
> texts in
> a given order, others already have some matching_blocks that they can
> give you,
> etc. You have the possibility of different line-matching algorithms,
> which may
> or may not match what he store can give you, etc.
> At some point, I think we also would like to add caching if it is
> possible to do
> in a reasonable way. How does caching work with different parameters you
> can use
> to generate the values?

I think what you've done is a big step in the right direction. I agree that
"solving" annotate elegantly has its challenges both at the UI level and
at the implementation level. As Stefan and others are pointing out though,
the value of the annotate *command* is pretty limited at the end of the day
regardless, so there's not much point over-engineering this. I don't think
you're doing that BTW and cleaning exposing the functionality possible at
the API layer does have value well beyond the annotate command.

> If we have a cache, would we get rid of policies, and just try to find the
> "best" algorithm? (In my mind, a special whitespace patience diff, which
> can
> mark lines with multiple possible sources.)

Excellent question. I *like* the policies, particularly the one asked for
at PyCon. I think we ought to expose them. Looking at the figures below,
my first comment on caching is that caching the revision-id-to-revno map
is an order of magnitude more important that worrying about a policy-aware
annotation cache! I'm thinking OOo with 500K revisions and I'm not looking
forward to benchmarking *any* policy right now. :-(

> Some more interesting data points...
> 
> policy        annotate    annotate --show-ids
> merge-node    10.278s        6.797s
> right-head    10.131s        5.972s
> simple-right    10.120s        3.966s
> no-merge    10.016s        2.570s
> no-merge2     9.941s        0.974s

Wow.

I'll send my review of test_annotation_policy in a follow-up email soon.

Ian C.