sketch of a new interface between loggerhead and bzrlib

Fri Jun 22 14:48:11 BST 2007

Surprising.  I would have expected that a database cache would
complement Bazaar well.  After all, Bazaar revisions never change, so
you don't have to even worry about cache expiry if you don't want to.
But a database should give you even better speed than Bazaar itself
provides.

A database cache should provide a stopgap for the fact that Bazaar
doesn't mark a directory modified when its children (or grandchildren)
are modified.  You can calculate this once, which is somewhat slow, and
then never need to calculate that again.

Plus you can run queries like "What are the last 10 commits made across
all branches?" and "Who has made the most commits".

> Do the wonderful and wise in here think this interface is reasonable?
> Implementable?  I'm reasonably sure it's featureful enough to provide
> all the functionality of today's loggerhead, but maybe I'm being blind
> here, so I'd appreciate comments on that too.

Well, it reminds me a little too much of my work on trac+bzr, where I
had to implement shadow "Branches", "Trees" and "Repositories" that
behaved more like Trac expected them to.  Since you have the freedom to
adapt Loggerhead to work smoothly with Bazaar's model, I'd think heavy
adaptation should not be necessary.

FileChangeRef: seems either too short or to long.  Why only path, fileid
and oldpath?

If you look at the output of Tree._iter_changes, it tells you
1. file-id
2. path
3. old path
4. past and current name (indicates whether a file was renamed)
5. past and current parent directory (indicates whether the file was
moved into another directory)
6. past and present versioning
7. past and present file type
8. whether the contents were changed
9. whether the execute bit was changed

Pretty much everything here seems like something you might want to
provide.  It does not provide it in a convenient form, but we have been
planning to change that.  Note that comparing the old path to the new
path will not tell you whether a file has been renamed-- its parent
directory may have been renamed instead.

I explicitly do not like organizing things by change type.  I like to
see them organized by file.

So instead of your FileChanges object, I'd rather see a list of
FileDelta objects.  FileDelta objects would have members file_id,
old_path, new_path, old_name, new_name, old_parent, new_parent,
old_type, new_type, content_change, execute_change.  Actually, we should
add something like that to bzrlib.

Also, are you sure it's a good idea to restrict yourself to the lefthand
parent?  I'd want it to be *possible* to compare with the righthand
parent at least.

I don't think it's a good idea to make a representation of tree changes
be part of the RevisionInfo object because that data is relatively
expensive to generate, and not always needed.  Perhaps a getter instead?

Branch nick is just one possible revision-property.  There are others,
notably the bugfix property.  I'd suggest exposing all
revision-properties.  If you want more convenient access to the branch
nick, you can just make a Python property to retrieve it.

BranchRevisionInfo looks interesting, but I would look at putting
where_merged into the RevisionInfo object instead.  (And this is an
example of where a database cache wins)

History.getRevisionInfo: Much of this info is not "cheap".  Especially
dotted-revno and tree comparison data requires a lot of computation.  In
contexts where you want only a single revision, a getRevisionInfo call
makes sense.  For the log view, I would strongly recommend providing an
iterator or list.

History.getBundle: Please don't generate bundles against the lefthand
parent.  Bundles are for merging, so they should be generated relative
to a target branch-- the branch they are being merged into.  In order to
be merged successfully, branches must contain all the revisions which
are not present in the target branch.  A bundle that does not contain
all the necessary revisions is no more useful than a patch.

normalizeFileArguments: I think this would be clearer as two separate
functions.

> There a couple of open issues:
> 
> 1) This interface is designed to hide all of bzrlib, whereas it could
>    just provide getReivisionInfo.  I think it's just about worth the
>    extra effort adding things like getFileText to have the entire
>    interface in one file.

Well, to me, bzrlib is quite nice and I don't want to hide it.  But
BzrInspect isn't Loggerhead and doesn't compete with it.

> 2) There are some things you'd probably do differently if you were
>    expecting this data to be cached in a relational DB.  I think it's
>    probably sensible design the most convenient interface for
>    loggerhead, and worry about storage after that.

I think you should push things in both directions-- don't require
expensive operations to be performed when they are unnecessary.  But do
require that everything you want is conveniently available.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGe9Mb0F+nu1YWqI0RAqROAKCIO1Bbf4cyZxdauwXrnUNkANanfwCeLwpj
KErsHgtC1BNXDho4nbzb0Fc=
=gqsq
-----END PGP SIGNATURE-----