brisbane: initial cut at a mergeline cache

Wed Apr 1 23:40:11 BST 2009

Alexander Belchenko wrote:
[...]
>
> There is many cases where caches may help. Log, annotations.
> I'm just don't understand why core devs don't want to implement caches.
> It's too hard?

The problem with caches is they aren't necessarily the best answer.

They are explicitly a performance tradeoff.  At some point you have to
spend time and space (and quite likely bandwidth) to build and maintain
the cache.  In exchange for that you get improved performance elsewhere.
But whether the tradeoff is worthwhile depends on how large the
performance improvement is vs. how large the penalty is, and how much
you care about the improved operations vs. the penalised operations.

A fairly well-known example of caches not being an automatic improvement
is that in relational databases adding more indexes on a table can
sometimes *reduce* performance.  In bzr we've already found that
replacing the revision-history file in branch format 5 with the
last-revision file in branch 6 (effectively removing a revno cache) was
a dramatic improvement in almost all cases.

In the case of caching stat info in the dirstate, we clearly found that
to be a massive win, so we certainly aren't against implementing caches!
It's interesting to note though that in the case of dirstate we needed
to maintain some state about the working tree anyway, so there was
already a fairly cheap point in the design to include the stat cache.
It's also interesting to note that dirstate isn't 100% great, it's
caused some headaches for Windows users.

So we're not against caches.  But they are just one possible solution,
and there may be others with a more desireable set of tradeoffs.  That's
all.

-Andrew.