bzr-history-db summary

Thu Apr 15 08:24:31 BST 2010

On 15/04/2010 01:26, John Arbash Meinel wrote:
> I'll try to summarize why I think it is useful or us to have something
> like bzr-history-db available.
> 
> The main downside is that we would be adding a layer of complexity, of
> caching, of code that can go wrong, etc. So what are we getting out of it.
> 
> 1) I don't believe that we can compute merge_sort or dotted revnos much
>    faster than we do today.
> 
>    a) If we added gdfo to our disk structures, we could do a little bit
>       better. Especially in the average case.
>    b) But the best I can find for merge_sort is that it scales with the
>       amount of changes introduced since a branch was started. (eg. you
>       start a branch "foo" w/ 2 commits at trunk at 100, and then 50
>       changes land including 10 revisions each, we will have to consider
>       50*10=500 revisions when you have landed foo into trunk at 151.)
>    c) it might be possible to tweak merge_sort more for 'trivial'
>       branches as long as we have known-children. (You walk back the
>       mainline, and see if there is nothing interesting other than that
>       branch). However, I don't think that scales to the general case.
> 
> 2) merge_sort is a useful way to display ancestry
> 
>    a) Really powerful when combined with landing features onto a trunk
>       branch
> 
>    b) qlog is very nice, being able to drill-down as needed.
> 
>    c) Alternatives are generally some form of flat display.
>       Topological/chronological. You could probably do something
>       less-flat with topological, and we'd also probably want to find a
>       way to make it 'stable'.
> 
> 3) dotted_revnos
> 
>    a) Approx 8% overhead over merge_sort using the current algo. I do
>       think a different algo could be cheaper, but it isn't a major
>       benefit.
>    b) give a short, easy to remember and type handle to non-mainline
>    c) if we wanted a different handle, we would still need to index it
>       for it to be fast to look up. (john--abcdef [tip and tail of
>       revid], first 5 bytes of hash(revid)).
>       The index would have different characteristics, and probably less
>       complexity than bzr-history-db.
>    d) 2010.1.2 obviously comes after 2010.1.1, john--abcd and john--z5es
>       don't share any such relation. And neither does a314u fefef2
>    e) Allows 'nice' urls for loggerhead
>    f) Can be accurately reproduced without copy & paste, and without
>       reading the value char-by-char.
> 
> 4) Other downsides
> 
>    a) if the cache isn't populated at all, it is slower to generate the
>       history graph, and store everything, then to just generate the
>       history and use it.
>       Example:
> 	it takes 4s to run 'bzr log -n0 -r 99789 emacs/trunk' w/o cache
>         it takes 30s to import emacs/trunk into bzr-history-db
>         it takes 0.9s to run 'bzr log' w/ cache
> 
>    b) However, if the cache is populated with related-but-different
>       data, it is usually faster to fill in the remainder and use it,
>       then it is to start from scratch.
>       Example:
>         it takes 188ms to import emacs at 99719.1.30
>         .188s + 0.9s << 4s
> 
>    c) The current cache code isn't very graceful about doing (b) yet. If
>       you trigger bzr log and it can't it just falls back.
> 
>    d) File-view. If you want to map the last-modified value of a
>       file/dir back to its dotted revno, they are often quite old. (In
>       emacs case, the top-level directories etc/, lib-src, lisp, etc all
>       have revnos <10). This is further exacerbated by asking for them
>       one-at-a-time, so we end up spending 400ms per revno for very old
>       ones, when we could spend 400ms to get everything...
> 
> I think this is a reasonable summary. If I think of anything more
> tomorrow, I'll try to mention it.
> 
> John
> =:->

Thanks for this.

As soon as I have some time, I'm going to look at integrating either
this, or igc's history-cache with qlog. I think that igc's history cache
will be better because it can provide qlog with merge line, but I need
to  look at both in detail.

Gary