HttpTransport with async pipelining

Tue Aug 2 19:10:44 BST 2005

On  2 Aug 2005, John A Meinel <john at arbash-meinel.com> wrote:

> Well, there are 2 possible levels of a server. If you followed the
> discussion, Aaron and I talked about it for a while.

I did see that, and will reply now.  

I think a smart protocol should work at a higher level than the
interface to a store on disk.  For example, there should be a commit()
operation, which takes the revision and inventory xml, and the texts of
any new files.  They can be stored by the server without the client
having any knowledge about where exactly they will be put on disk or
what compression will or will not be applied.

This seems to indicate a basic Branch class, each instance of which has
one of several possible BranchStorages.  Storage methods backed by files
can access them over a pfs-like Transport; others that go over rpc will
not.  They might have a different kind of transport object that handles
invoking rpcs over http, ssh, or whatever.

> The other possibility was to make a SmartBranch, which started to
> override more of the Branch operations. Aaron was thinking that it could
> serve up the .text_store and .inventory_store, etc on it's own (not a
> separate Storage + Transport class). He feels that they are part of the
> public interface of Branch, and thus needs to be preserved.
> 
> >From my experience, though, the *_store members are more of an
> implementation detail. Everyone else should be going through the
> get_revision() type interfaces. (Otherwise they just get files, rather
> than getting Revision objects).
> There might be some places that go directly to the store, but I feel
> those probably should just be cleaned up.

I agree that they should be cleaned up, and that only a few things about
the store should be exposed.  I don't think the Branch should expose the
fact that revisions are kept in concrete Stores, but it should be
possible to get back, for example, the raw xml of an inventory without
having it unpacked into an object.

> Anyway, a SmartBranch can be much smarter than a simple SmartTransport,
> because it has more knowledge about what it really wants/needs. For
> instance, rather than getting all the copies of the texts, and then
> computing a diff, the diff could be computed on the server side, and
> sent along.
> That to me is more work, and might be better to implement later.

Right. 

> A SmartTransport, basically just implements a *slightly* better network
> file system, which allows locking and pipelining (as opposed to
> sftp/rsync). Which leaves the intelligence in the client rather than
> having a heavy server.

Right, so we could work at approximately the level of the operations in
commit, with rpcs like this:

  lock
  get history
  insert texts into stores
  insert inventory xml
  insert revision
  append to history
  unlock

> For the bzr.dev changes:
> 
> http://bzr.arbash-meinel.com/bzr-split-storage/
> 
> And SftpTransport is available in plugin form from:
> http://bzr.arbash-meinel.com/plugins/sftp/
> 
> RsyncTransport is mixed in with my rpush/rpull plugin:
> http://bzr.arbash-meinel.com/plugins/rsync/
> 
> But both the Sftp and Rsync are available from a single file
> rsync_transport.py, sftp_transport.py. The plugin just makes sure that
> when it is loaded they are installed.
> 
> You can decide if you want to add the sftp and rsync, or if you want to
> leave them as plugins. It would be nice to get them into the core, but
> it isn't really necessary.

I think it'd be good to just put them in; I'll look this afternoon.

(by the way, Gustavo Niemeyer says he's going to send a patch for
system-wide plugins)

-- 
Martin