HttpTransport with async pipelining

Tue Aug 2 15:45:27 BST 2005

Martin Pool wrote:
> On  2 Aug 2005, John Arbash Meinel <john at arbash-meinel.com> wrote:
>
>>Well, I showed up on my split-transport branch again, and I decided to
>>try and implement some pipelining for http transport.
>
>
>
>>Basically, what I did was to change the effbot.org library, such that
>>http_client.async_http() instead of being a single object per file, it
>>now was a single object per schema+host+port. And then the objects would
>>build up a queue of requests. Otherwise you can't pipeline your requests
>>with the same connection. The old effbot.org method of using
>>http_manager actually created a new connection for each file, rather
>>than re-using the same one.
>>
>>Anyway, with the modified versions, on my local network, instead of "bzr
>>branch" taking:
>>real    6m55.448s
>>user    1m12.872s
>>sys     0m34.342s
>>
>>it now takes:
>>real    3m13.656s
>>user    0m50.325s
>>sys     0m23.665s
>>
>>So I cut the time down to almost half. (relative to the current bzr.dev
>>tree).
>
>
> Good stuff.
>
> I'd like to see about adding a smart server over xmlrpc -- the python
> xmlrpc libraries seem like they would make it pretty easy, first off for
> commit because that's hardest to do well over sftp.  It may be better to
> send the exact xml so as to avoid issues about different forms with
> different hashes.

Well, there are 2 possible levels of a server. If you followed the
discussion, Aaron and I talked about it for a while.

The lower level would just look like another Transport agent (maybe a
Storage as well). That would be rather easy to implement. But at that
level, you are stuck talking about whole files (possibly less, as I
wanted a get_partial(), but my get_diff() proposal had some flaws).

I can see the RPC as being pretty simple for this sort of operation. It
would just need to bundle up a few commands, send them across, and parse
the results. I've used python's XML-RPC, and it makes certain things
very easy (it registers all the functions for a given class, etc).

We probably need to work out some of the interfaces to a Storage
location, since you seem very keen on using Weave for merging, we
probably need a get_weave() request for store. Basically, if you are
thinking to store something so that you can get it back easily with one
storage form, you need to be able to generate that from the other
storage forms. Either that, or you have get_weave() as part of the
WeaveStorage, and at a higher level you check to see if the current
storage mechanism is capable of giving you a weave, and if not, you
figure out a different way to get one. (But I think that if you go
revfile => weave, there should be something faster than revfile => full
texts => weave, since you already have some of the deltas computed.)

The other possibility was to make a SmartBranch, which started to
override more of the Branch operations. Aaron was thinking that it could
serve up the .text_store and .inventory_store, etc on it's own (not a
separate Storage + Transport class). He feels that they are part of the
public interface of Branch, and thus needs to be preserved.

>From my experience, though, the *_store members are more of an
implementation detail. Everyone else should be going through the
get_revision() type interfaces. (Otherwise they just get files, rather
than getting Revision objects).
There might be some places that go directly to the store, but I feel
those probably should just be cleaned up.

Anyway, a SmartBranch can be much smarter than a simple SmartTransport,
because it has more knowledge about what it really wants/needs. For
instance, rather than getting all the copies of the texts, and then
computing a diff, the diff could be computed on the server side, and
sent along.
That to me is more work, and might be better to implement later.

A SmartTransport, basically just implements a *slightly* better network
file system, which allows locking and pipelining (as opposed to
sftp/rsync). Which leaves the intelligence in the client rather than
having a heavy server.

A SmartBranch would take quite a bit more to implement. It's probably
worth it, but I think we could have nice remote operations working over
SmartTransport a lot sooner.

>
>
>>Also, I'm still hoping to get my changes merged into the mainline. If
>>only for the SftpTransport and RsyncTransport support.
>
>
> I'd love to; could you please give me a url for the branch?
>

For the bzr.dev changes:

http://bzr.arbash-meinel.com/bzr-split-storage/

And SftpTransport is available in plugin form from:
http://bzr.arbash-meinel.com/plugins/sftp/

RsyncTransport is mixed in with my rpush/rpull plugin:
http://bzr.arbash-meinel.com/plugins/rsync/

But both the Sftp and Rsync are available from a single file
rsync_transport.py, sftp_transport.py. The plugin just makes sure that
when it is loaded they are installed.

You can decide if you want to add the sftp and rsync, or if you want to
leave them as plugins. It would be nice to get them into the core, but
it isn't really necessary.
You could also add them as "recommended plugins" as you did once with
Rsync, and I can even maintain them from the source, rather than keeping
a separate project as I have done so far.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050802/773cd4e8/attachment.pgp