history-db, caching dotted revnos, incremental merge_sort
John Arbash Meinel
john at arbash-meinel.com
Mon Apr 12 17:45:57 BST 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
> On 10/04/10 08:02, John Arbash Meinel wrote:
>
>> Anyway, this is too long anyway, I hope people found some of the
>> insights as interesting as I did.
>
> Thanks for doing the detailed measurements, analysis and write-up.
>
>> I'm pretty sure it tells me that while we could change how revnos are
>> created, I don't think it can have a large impact on performance. And
>> while gdfo is a decent filtering tool, it still costs a lot.
>
> So the $64 million question remains: how do we get great performance on
> projects with deep, complex history without dropping dotted revision
> numbers altogether?
>
> My expectation is that different UIs will have very different demands on
> revno <-> revision-id lookup. For example, at the command line, commands
> like ...
>
> log -v -p -r x.y.z
>
> require fast conversion from x.y.z to a revision-id, something that my
> historycache plugin could provide thanks to its "development-line" cache of
>
> x.y => (length, last-revision-id).
>
If you go back to my original 'get_dotted_revno' numbers, mapping
revision_id => dotted_revno was taking:
6ms get_dotted_revno -r -110
7ms get_dotted_revno -r 2497.329.15 (in the area of -r -100
13ms get_dotted_revno -r 1700.338.1 (in the range of -r -1000)
The design of the cache is such that "dotted_revno => revision_id"
should take a comparable time. I certainly can look at it closer.
Also when I say 'slow', that is 10 minutes to cache all of the mainlines
of all branches in mysql. Which is 17k branches. Or an average of ~35ms
each. I would like to get some numbers about median and max here, as I
would guess it plays a big part.
I had been hoping for it to be even faster than that, however adding
35ms to 'commit' (of a merge) is reasonable IMO, if it means that 'bzr
log' is always full-speed.
> OTOH, I suspect Loggerhead and qlog need fast conversion the other way,
> from revision-id to revno. IIUIC, your plugin will use a combination of
> smarter logic and more caching to speed up the *general* case.
Hopefully what I've done can be bidirectional *and* handle the
'merge_sort' case. You version definitely helped the X.Y.Z => rev_id
case. But if the range you were logging had merges (and -n0) it still
fell back to loading the whole graph.
This should let us load only as much of the graph as -n0 will display.
>
> If performance is still a problem (and I suspect it will be), perhaps we
> ought to selectively constrain the problem further. For example, if the
> challenge was "how fast can you assign revnos to the next level in qlog"
> for a revision where the revno is already known, what numbering scheme
> and caching scheme would be best do you think?
>
> Would your answer change if the critical thing to make fast was to
> assign revnos to parent revisions?
>
> Ian C.
>
The main constraint that makes everything time consuming is "include
this only if it has not been merged before".
Essentially you are doing a graph difference of the ancestry in this
revision with the ancestry in the previous revision.
If we switched to a 'simple' topological sort, that only enforces
children after parents, you can speed a lot of this up. And here GDFO
could be even better.
If we further relaxed it to be 'chronological', then *boom* we would
remove all the bits that are currently making it slow. However, I think
our view of history is very powerful, and would be very hesitant to let
it go.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkvDTkQACgkQJdeBCYSNAAN69gCgmbwIFZYrL+zBNKFNHDjRx5Yn
aP0AoNclIqI9pbDQ8pk1ImvlvVaJwVyZ
=ekH5
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list