history-db, caching dotted revnos, incremental merge_sort

Mon Apr 12 17:45:57 BST 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> On 10/04/10 08:02, John Arbash Meinel wrote:
> 
>> Anyway, this is too long anyway, I hope people found some of the
>> insights as interesting as I did.
> 
> Thanks for doing the detailed measurements, analysis and write-up.
> 
>> I'm pretty sure it tells me that while we could change how revnos are
>> created, I don't think it can have a large impact on performance. And
>> while gdfo is a decent filtering tool, it still costs a lot.
> 
> So the $64 million question remains: how do we get great performance on
> projects with deep, complex history without dropping dotted revision
> numbers altogether?
> 
> My expectation is that different UIs will have very different demands on
> revno <-> revision-id lookup. For example, at the command line, commands
> like ...
> 
>   log -v -p -r x.y.z
> 
> require fast conversion from x.y.z to a revision-id, something that my
> historycache plugin could provide thanks to its "development-line" cache of
> 
>   x.y => (length, last-revision-id).
> 

If you go back to my original 'get_dotted_revno' numbers, mapping
revision_id => dotted_revno was taking:
 6ms	get_dotted_revno -r -110
 7ms	get_dotted_revno -r 2497.329.15 (in the area of -r -100
13ms	get_dotted_revno -r 1700.338.1 (in the range of -r -1000)

The design of the cache is such that "dotted_revno => revision_id"
should take a comparable time. I certainly can look at it closer.

Also when I say 'slow', that is 10 minutes to cache all of the mainlines
of all branches in mysql. Which is 17k branches. Or an average of ~35ms
each. I would like to get some numbers about median and max here, as I
would guess it plays a big part.

I had been hoping for it to be even faster than that, however adding
35ms to 'commit' (of a merge) is reasonable IMO, if it means that 'bzr
log' is always full-speed.

> OTOH, I suspect Loggerhead and qlog need fast conversion the other way,
> from revision-id to revno. IIUIC, your plugin will use a combination of
> smarter logic and more caching to speed up the *general* case.

Hopefully what I've done can be bidirectional *and* handle the
'merge_sort' case. You version definitely helped the X.Y.Z => rev_id
case. But if the range you were logging had merges (and -n0) it still
fell back to loading the whole graph.

This should let us load only as much of the graph as -n0 will display.

> 
> If performance is still a problem (and I suspect it will be), perhaps we
> ought to selectively constrain the problem further. For example, if the
> challenge was "how fast can you assign revnos to the next level in qlog"
> for a revision where the revno is already known, what numbering scheme
> and caching scheme would be best do you think?
> 
> Would your answer change if the critical thing to make fast was to
> assign revnos to parent revisions?
> 
> Ian C.
> 

The main constraint that makes everything time consuming is "include
this only if it has not been merged before".

Essentially you are doing a graph difference of the ancestry in this
revision with the ancestry in the previous revision.

If we switched to a 'simple' topological sort, that only enforces
children after parents, you can speed a lot of this up. And here GDFO
could be even better.

If we further relaxed it to be 'chronological', then *boom* we would
remove all the bits that are currently making it slow. However, I think
our view of history is very powerful, and would be very hesitant to let
it go.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvDTkQACgkQJdeBCYSNAAN69gCgmbwIFZYrL+zBNKFNHDjRx5Yn
aP0AoNclIqI9pbDQ8pk1ImvlvVaJwVyZ
=ekH5
-----END PGP SIGNATURE-----