sketch of a new interface between loggerhead and bzrlib

Aaron Bentley aaron.bentley at utoronto.ca
Fri Jun 22 18:05:50 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Hudson wrote:
> Aaron Bentley wrote:
>> Surprising.  I would have expected that a database cache would
>> complement Bazaar well.
> 
> Oh, these are eminently practical concerns, not theoretical ones.  I
> don't know where in the stack the problem is, but the changes caches
> routinely get corrupted and have to be thrown away (one or two a day on
> average on codebrowse.launchpad.net).

Well, as a temporary measure, sure.  But you might want to just switch
to a different database.  I'm kinda surprised it's using pdb rather than
SQLite.

>> I explicitly do not like organizing things by change type.  I like to
>> see them organized by file.
> 
> Well, the reason I put it like that is that is how it's likely to be
> displayed:
> 
>     added: ...
>   removed: ...
>   renamed: ...
>  modified: ...

Yeah, I also don't like *displays* that are organized by change type...

>> Also, are you sure it's a good idea to restrict yourself to the lefthand
>> parent?  I'd want it to be *possible* to compare with the righthand
>> parent at least.
> 
> I'm not sure.  Given that there will be a way of comparing an arbitrary
> pair of revisions, this will certainly be possible, but that's different
> from it being easy, or cached.

I thought you were proposing that this would be the complete interface.

>> BranchRevisionInfo looks interesting, but I would look at putting
>> where_merged into the RevisionInfo object instead.  (And this is an
>> example of where a database cache wins)
> 
> But where_merged isn't stable, is it?

It's append-only.

>> History.getRevisionInfo: Much of this info is not "cheap".  Especially
>> dotted-revno and tree comparison data requires a lot of computation.
> 
> But dotted-revno _has_ to be cheap, as it's going to be the most
> frequently requested thing.

Well, I'm sorry, but it's *not*, even if it *has* to be.  It requires a
topological sort of the entire revision history, which scales O(n) with
the number of nodes, at best.  Relative to, say,
Repository.get_revisions, it is very pricy.

>  I was thinking of storing
> branch.get_revision_id_to_revno_map() on the History object somewhere --
> even for launchpad (my archetypal large repository, if you hadn't
> noticed already) it only takes 700ms, which seems a reasonably cost to
> pay once per 'bzr push'.

How are you paying this once per push?  I thought you were paying it
once per revision you display.

>> In contexts where you want only a single revision, a getRevisionInfo call
>> makes sense.  For the log view, I would strongly recommend providing an
>> iterator or list.
> 
> Why, particularly?

So that you can pay the cost of branch.get_revision_id_to_revno_map up
front, and retrieve multiple revisions at once.

> I don't know what BzrInspect is :)

It is a web-based repository viewer for, say, debugging repositories.  I
mentioned it in my 18/06/07 message to you.

http://panoramicfeedback.com/opensource/bzr/repo/BzrInspect/

> Sure, but Python is quite good at being lazy.  Just because you return
> something with a .changes attribute doesn't mean you have to compute it :)

Certainly, but I think that an API should express which operations are
expensive and which are not, to aid people using it.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGfAFu0F+nu1YWqI0RAghCAJ93pSOe6anPWWF0m1SQwpGXH3/YbQCghqAZ
+QY2RqjvlbGzjbHnhkTaV5M=
=4RqP
-----END PGP SIGNATURE-----



More information about the bazaar mailing list