"bzr log" by revision-id is slow
Andrew Bennetts
andrew at bemusement.org
Thu Oct 13 09:47:47 UTC 2011
[CCing John for his wisdom on history-db, and dotted revnos in general]
Eli Zaretskii wrote:
> > Date: Thu, 13 Oct 2011 10:23:18 +1100
[…]
> > revision object to guide the search, so that's O(revs-in-ancestry).
>
> But what I see is O(1), and the time is very long. It almost sounds
For a fixed ancestry — you're running all these measurements against the
same emacs branch I believe — O(revs-in-ancestry) is O(1). Say you made
a branch into a new, standalone repository with precisely half as many
revisions in the ancestry. In that half-size repo I expect you'd find
the time for that case to be halved.
I agree it's a bit surprising that it seems to be entirely independent
of the position in the ancestry; I wonder if it's bulk loading the whole
graph in advance to minimise the worst-case cost? Hmm, it's possible that
because of edge cases calculating the dotted revno would often need to
go as far back as the origin.
> Btw, there's one hint in the REVID to perhaps guide the search: the
> date of the commit. Can that be used to speed up the search?
Not with the current indices, at least not without some hackish
assumptions about revids that aren't true in general. Also it would
have to be done very carefully because it's not a trustworthy value:
computer clocks can be wrong (and surprisingly often are).
One approach I guess would be to use the date in the revision as a
filter to preload just the revisions whose IDs appear to be newer or
merely only a little bit older (as revision IDs allocated by the bzr
command-line tool, as opposed to e.g. importers, embed the date in the
revid).
I'm probably forgetting some subtlety with dotted revnos though that
requires searching back much earlier than you'd expect to be sure that
the revision wasn't merged in at some earlier point that would change
the dotted revno you'd guess from a partial graph. (A revision might be
reachable from the tip via several different paths.)
> > In principle a log format that doesn't show the revno can avoid this
> > cost, but I don't know if such a thing has been implemented.
>
> If most of the time it takes is to look up the revno, then indeed not
> showing the revno would be a very good optimization.
Another compromise might be an option to only try to find mainline (i.e.
non-dotted) revnos, which would limit the search somewhat. Although for
the emacs history which is I guess is mostly linear (from the imported
revisions) that wouldn't help much in general. It would make logging
recent mainline revs much faster though, which I'd expect is a common
use-case.
> Can you name those, please? I already have history_db, but the time
> this takes with and without it is the same.
Hmm, I think that one is the newest and best of the set. I'm a bit
surprised it doesn't help, but I'm actually not very familiar with it.
John may know more. There are some older experiments that tried
slightly different approaches, but they probably don't work well with
current bzr versions anymore. FWIW they are
https://launchpad.net/bzr-revnocache and
https://launchpad.net/bzr-historycache. I'm sure John will let us know
if I've missed any other noteworthy plugins :)
-Andrew.
More information about the bazaar
mailing list