bzr-history-db summary
Gary van der Merwe
garyvdm at gmail.com
Thu Apr 15 08:24:31 BST 2010
On 15/04/2010 01:26, John Arbash Meinel wrote:
> I'll try to summarize why I think it is useful or us to have something
> like bzr-history-db available.
>
> The main downside is that we would be adding a layer of complexity, of
> caching, of code that can go wrong, etc. So what are we getting out of it.
>
> 1) I don't believe that we can compute merge_sort or dotted revnos much
> faster than we do today.
>
> a) If we added gdfo to our disk structures, we could do a little bit
> better. Especially in the average case.
> b) But the best I can find for merge_sort is that it scales with the
> amount of changes introduced since a branch was started. (eg. you
> start a branch "foo" w/ 2 commits at trunk at 100, and then 50
> changes land including 10 revisions each, we will have to consider
> 50*10=500 revisions when you have landed foo into trunk at 151.)
> c) it might be possible to tweak merge_sort more for 'trivial'
> branches as long as we have known-children. (You walk back the
> mainline, and see if there is nothing interesting other than that
> branch). However, I don't think that scales to the general case.
>
> 2) merge_sort is a useful way to display ancestry
>
> a) Really powerful when combined with landing features onto a trunk
> branch
>
> b) qlog is very nice, being able to drill-down as needed.
>
> c) Alternatives are generally some form of flat display.
> Topological/chronological. You could probably do something
> less-flat with topological, and we'd also probably want to find a
> way to make it 'stable'.
>
> 3) dotted_revnos
>
> a) Approx 8% overhead over merge_sort using the current algo. I do
> think a different algo could be cheaper, but it isn't a major
> benefit.
> b) give a short, easy to remember and type handle to non-mainline
> c) if we wanted a different handle, we would still need to index it
> for it to be fast to look up. (john--abcdef [tip and tail of
> revid], first 5 bytes of hash(revid)).
> The index would have different characteristics, and probably less
> complexity than bzr-history-db.
> d) 2010.1.2 obviously comes after 2010.1.1, john--abcd and john--z5es
> don't share any such relation. And neither does a314u fefef2
> e) Allows 'nice' urls for loggerhead
> f) Can be accurately reproduced without copy & paste, and without
> reading the value char-by-char.
>
> 4) Other downsides
>
> a) if the cache isn't populated at all, it is slower to generate the
> history graph, and store everything, then to just generate the
> history and use it.
> Example:
> it takes 4s to run 'bzr log -n0 -r 99789 emacs/trunk' w/o cache
> it takes 30s to import emacs/trunk into bzr-history-db
> it takes 0.9s to run 'bzr log' w/ cache
>
> b) However, if the cache is populated with related-but-different
> data, it is usually faster to fill in the remainder and use it,
> then it is to start from scratch.
> Example:
> it takes 188ms to import emacs at 99719.1.30
> .188s + 0.9s << 4s
>
> c) The current cache code isn't very graceful about doing (b) yet. If
> you trigger bzr log and it can't it just falls back.
>
> d) File-view. If you want to map the last-modified value of a
> file/dir back to its dotted revno, they are often quite old. (In
> emacs case, the top-level directories etc/, lib-src, lisp, etc all
> have revnos <10). This is further exacerbated by asking for them
> one-at-a-time, so we end up spending 400ms per revno for very old
> ones, when we could spend 400ms to get everything...
>
> I think this is a reasonable summary. If I think of anything more
> tomorrow, I'll try to mention it.
>
> John
> =:->
Thanks for this.
As soon as I have some time, I'm going to look at integrating either
this, or igc's history-cache with qlog. I think that igc's history cache
will be better because it can provide qlog with merge line, but I need
to look at both in detail.
Gary
More information about the bazaar
mailing list