[RFC] Pack-specific smart server verbs: check_references and autopack

Andrew Bennetts andrew at canonical.com
Thu Jun 19 05:58:47 BST 2008


This patch isn't quite ready for merging, but I would like to let people know
about it.  It adds some infrastructure and smart server verbs to optimise some
pack repository operations, rather than treating all repositories more-or-less
independently of their on-disk format as we do now.

The reason for this is that when pushing, two of the major causes of slow downs
are:

  * Packer._check_references.  After uploading a pack of new revisions the
    packer checks that all compression parents not present in that pack exist
    elsewhere in the repository, in order to make sure all the file texts will
    be reconstructable.  With VFS operations that tends to be many, many readvs
    of all the .tix files.  This happens on every push, and is often a large
    fraction of the total push time, e.g. 25% depending on the exact details of
    the push.

  * RepositoryPackCollection.autopack.  If certain thresholds are reached after
    adding a pack, an autopack will be triggered to combine several packs into a
    single pack.  At the moment this involves pulling down all that data and
    then reuploading it.  It only happens about one in every ten pushes (and
    with varying amounts of work to do), but it bites hard when it happens.

A single “stream some revisions into the remote repo” verb will probably to deal
with these intrinsically, but that hasn't been written yet.  So as a cheap
interim measure I thought I'd try writing some pack-specific HPSS code to
perform those operations in a single round trip each.

The main part is adding a new InterRepository, InterPacktoRemotePack.  It turned
out to be fairly easy to hook it all up.

The Packer._check_references part seems to work well, but I haven't yet thought
carefully about if it's always going to be better than the status quo, or if it
might sometimes be much worse.  I think it's probably a worthwhile change, but
feedback on this idea is welcome (even if it's just to say “yes, that is totally
fine, please do that”).

The autopack verb I've added I'm sure is worthwhile, but it doesn't quite work
right yet and has no tests.  It seems to do the right thing on the server, but
then leave the client in a state where it has the wrong pack-names cached,
causing a traceback after most of the push is done.  Probably that's not too
hard to fix, but I wonder if maybe I could hook it up in a better way.
Thoughts?

-Andrew.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: inter-remote-pack.patch
Type: text/x-diff
Size: 58552 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080619/f64091cf/attachment-0001.bin 


More information about the bazaar mailing list