brisbane: initial cut at a mergeline cache

Thu Apr 2 13:42:24 BST 2009

Alexander Belchenko wrote:
> Andrew Bennetts пишет:
>> Alexander Belchenko wrote:
>> [...]
>>> There is many cases where caches may help. Log, annotations.
>>> I'm just don't understand why core devs don't want to implement caches.
>>> It's too hard?
>>
>> The problem with caches is they aren't necessarily the best answer.
>
> What is the best answer then? Format zoo may be?

Well, I don't know what the answer is in this case.  (Although I suspect
that rethinking our dotted revno scheme may be part of the answer.)  I doubt
having a format zoo is, and I don't think anyone has suggested it would be.

I'm not unconditionally against adding caches, nor is any other core dev
AFAIK.  But I think we shouldn't just think “oh that's slow, let's just add
a cache” without first considering if that's really the best answer.  I'm
fairly sure that's all Robert was saying too.  If we can't think of a better
answer then of course we should do it.  But we shouldn't just assume that
caches are better than re-examining parts of our design.

[...]
> I'm talking about additional optional data that helps to improve known
> performance problems. Because such caches are optional, they can be used
> only for local operations.

In fact Robert's bzr-search plugin is an optional cache of the kind you're
in favour of, so I don't think there's really all that much disagreement
here!

>> So we're not against caches.  But they are just one possible solution,
>> and there may be others with a more desireable set of tradeoffs.  That's
>> all.
>
> Reading Robert's mails I have strong impression that core devs against caches
> as optional data.

As I say, Robert's own bzr-search implements an optional cache.  So he's
clearly not against it at all.  And I hope I've made it clear that I'm not
against them either.

> The problem with annotations after knit->pack watershed known for 1.5 year.
> And so?

Annotation is a bit slower, but still good enough for my day-to-day use so I
haven't minded too much.  But if someone proposed a patch/design to add an
annotation cache *I'd* certainly be interested.

I think we're actually on agreement on this point, in fact: IIRC with the
knit format cached annotations were mandatory, even if you didn't use them.
So there was a significant performance hit at updating annotations on every
commit when most of the time that data wasn't going to be used.

I'm pretty sure the intent was that we wanted to allow annotations to be
optionally cached as they are needed.  Although, no-one seems to have found
it slow enough to go to the effort to do so.  I certainly haven't seen any
patches or discussion about that as advanced as e.g. this mergeline cache
from Ian.

> Sorry, but I don't believe that optional data caches won't help for
> *local* operations.

Well, first I'll point to the annotation cache in the knit format as a
reminder that caches *can* hurt local performance.  Perhaps the key issue
there isn't mandatory vs. optional, but mandatory-in-advance vs.
just-in-time.  An optional annotation cache that, when enabled, slows down
every commit would hindrance to local performance, for examaple.

(Sorry for talking about the annotation cache rather than Ian's proposal,
but I haven't actually had the time to examine Ian's proposal in enough
detail to be able to talk about it usefully.)

Another example where local caching might be suboptimal is if every branch
must regenerate its own cache, even though most related branches will have
largely similar contents.  So to solve that you might try to put the cache
in the (shared) repo instead, but then you introduce potential problems with
lock contention on updating the cache just-in-time when doing read
operations on two branches concurrently... thus harming performance.

Also, many operations aren't local, and I believe Ian's original patch was
proposing to put caches on remote branches, where the cost-benefit analysis
is more complex.

So I think it is reasonable to question the assumption that just because it
is a cache it must help local performance, because that isn't always true.

None of this is meant as a criticism of Ian's patch, which I haven't yet
read in any detail.  (I suspect it's not going to get much time from him or
other core devs until after 1.14rc1.)  I'm just trying to explain why it's
reasonable to have the same healthy scepticism of
optimisation-by-adding-a-cache as for any other class of optimisation :)

-Andrew.