bzr-history-db summary

John Arbash Meinel john at arbash-meinel.com
Thu Apr 15 00:26:02 BST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'll try to summarize why I think it is useful or us to have something
like bzr-history-db available.

The main downside is that we would be adding a layer of complexity, of
caching, of code that can go wrong, etc. So what are we getting out of it.

1) I don't believe that we can compute merge_sort or dotted revnos much
   faster than we do today.

   a) If we added gdfo to our disk structures, we could do a little bit
      better. Especially in the average case.
   b) But the best I can find for merge_sort is that it scales with the
      amount of changes introduced since a branch was started. (eg. you
      start a branch "foo" w/ 2 commits at trunk at 100, and then 50
      changes land including 10 revisions each, we will have to consider
      50*10=500 revisions when you have landed foo into trunk at 151.)
   c) it might be possible to tweak merge_sort more for 'trivial'
      branches as long as we have known-children. (You walk back the
      mainline, and see if there is nothing interesting other than that
      branch). However, I don't think that scales to the general case.

2) merge_sort is a useful way to display ancestry

   a) Really powerful when combined with landing features onto a trunk
      branch

   b) qlog is very nice, being able to drill-down as needed.

   c) Alternatives are generally some form of flat display.
      Topological/chronological. You could probably do something
      less-flat with topological, and we'd also probably want to find a
      way to make it 'stable'.

3) dotted_revnos

   a) Approx 8% overhead over merge_sort using the current algo. I do
      think a different algo could be cheaper, but it isn't a major
      benefit.
   b) give a short, easy to remember and type handle to non-mainline
   c) if we wanted a different handle, we would still need to index it
      for it to be fast to look up. (john--abcdef [tip and tail of
      revid], first 5 bytes of hash(revid)).
      The index would have different characteristics, and probably less
      complexity than bzr-history-db.
   d) 2010.1.2 obviously comes after 2010.1.1, john--abcd and john--z5es
      don't share any such relation. And neither does a314u fefef2
   e) Allows 'nice' urls for loggerhead
   f) Can be accurately reproduced without copy & paste, and without
      reading the value char-by-char.

4) Other downsides

   a) if the cache isn't populated at all, it is slower to generate the
      history graph, and store everything, then to just generate the
      history and use it.
      Example:
	it takes 4s to run 'bzr log -n0 -r 99789 emacs/trunk' w/o cache
        it takes 30s to import emacs/trunk into bzr-history-db
        it takes 0.9s to run 'bzr log' w/ cache

   b) However, if the cache is populated with related-but-different
      data, it is usually faster to fill in the remainder and use it,
      then it is to start from scratch.
      Example:
        it takes 188ms to import emacs at 99719.1.30
        .188s + 0.9s << 4s

   c) The current cache code isn't very graceful about doing (b) yet. If
      you trigger bzr log and it can't it just falls back.

   d) File-view. If you want to map the last-modified value of a
      file/dir back to its dotted revno, they are often quite old. (In
      emacs case, the top-level directories etc/, lib-src, lisp, etc all
      have revnos <10). This is further exacerbated by asking for them
      one-at-a-time, so we end up spending 400ms per revno for very old
      ones, when we could spend 400ms to get everything...

I think this is a reasonable summary. If I think of anything more
tomorrow, I'll try to mention it.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvGTwoACgkQJdeBCYSNAAPUfACdGw9zsth/PhpyTSmgUnG1tlSN
2NMAoKAAAA6IBJtalpkgNrzn0xZVspR0
=X5M5
-----END PGP SIGNATURE-----



More information about the bazaar mailing list