history-db, caching dotted revnos, incremental merge_sort

Mon Apr 12 09:26:36 BST 2010

On 10/04/10 08:02, John Arbash Meinel wrote:

> Anyway, this is too long anyway, I hope people found some of the
> insights as interesting as I did.

Thanks for doing the detailed measurements, analysis and write-up.

> I'm pretty sure it tells me that while we could change how revnos are
> created, I don't think it can have a large impact on performance. And
> while gdfo is a decent filtering tool, it still costs a lot.

So the $64 million question remains: how do we get great performance on 
projects with deep, complex history without dropping dotted revision 
numbers altogether?

My expectation is that different UIs will have very different demands on 
revno <-> revision-id lookup. For example, at the command line, commands 
like ...

   log -v -p -r x.y.z

require fast conversion from x.y.z to a revision-id, something that my 
historycache plugin could provide thanks to its "development-line" cache of

   x.y => (length, last-revision-id).

OTOH, I suspect Loggerhead and qlog need fast conversion the other way, 
from revision-id to revno. IIUIC, your plugin will use a combination of 
smarter logic and more caching to speed up the *general* case.

If performance is still a problem (and I suspect it will be), perhaps we 
ought to selectively constrain the problem further. For example, if the 
challenge was "how fast can you assign revnos to the next level in qlog" 
for a revision where the revno is already known, what numbering scheme 
and caching scheme would be best do you think?

Would your answer change if the critical thing to make fast was to 
assign revnos to parent revisions?

Ian C.