Streaming push progress update

Andrew Bennetts andrew.bennetts at canonical.com
Wed Feb 11 04:45:37 GMT 2009


I've been pairing with Robert on the network push work.  We have some code
that is passing some simple acceptance tests although it breaks other tests
and takes some shortcuts.  The branch adds a Repository.insert_stream RPC,
but currently all records that are sent are first converted to fulltexts.
This is correct in that it delivers all the necessary data to the remote
repository...  obviously it's going to waste a fair bit of bandwidth and cpu
though.

We have some promising results, though.  On a 500ms latency loopback
network, we timed a push to a stacked remote branch via bzr:// using bzr.dev
and using our branch.

Our branch (insert_stream RPC):
-------------------------------
2m17.8s, 115 HPSS calls
 * 115 = 1m55s latency overhead
 * remainder: 22s.

bzr.dev:
-------------------------------
4m8.6s, 242 HPSS calls.
 * 242 = 4m2s latency overhead
 * remainder: 4s.

As I mentioned above, the insert_stream is decompressing and recompressing
fulltexts, so that is probably the bulk of the overhead visible in the
non-latency time (the 22s vs. 4s).

The data used in the test was bzr.dev (for the stacked-on branch) and the
streaming-push branch (for the newly created branch).

bzr.dev is suffering from <https://bugs.edge.launchpad.net/bugs/294479>,
which our branch avoids (because it streams in a single RPC).

Both branches are suffering from far far too many round-trips to create a
new remote branch (which explains many of the other 114 RPC calls), due to
using VFS methods.

Basically the good news here is that 
 a) this is exactly what we expect, so we aren't running into any unexpected
    problems; and
 b) the network support for the streaming push is working, although the data
    being put on top of it is still far from optimal.

The loom with all the code is here:
<http://people.ubuntu.com/~andrew/bzr/streaming-push>

-Andrew.




More information about the bazaar mailing list