Optimising branching and merging big repositories between far away locations...

Wed Oct 29 17:52:54 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> Asmodehn Shade wrote:
>> Hi,
> 
>> First thanks for all these details, they are quite interesting and I
>> will take some time to investigate a bit ;-)
> 
>> 1)
>> so I ran  a bzr branch -r1 with -Dphss... and my bzr has been stuck
>> there for a few hours now :
> 
> ...
For those of you following along, we did end up reaching a conclusion here.

Basically, the smart 'readv()' request batches up the whole request, and
buffers the data into memory on both sides before transmitting it to the
other side. And the other side batches the current readv() into a single
string buffer before yielding it to the next layer.

With bzr-1.5 that still worked, because we only talked in terms of 1
file's history at a time.

With 1.6+, we are now able to issue a readv() request that ends up
requesting 20GB of data from a single .pack file.

So on the server side, it would start reading in the bytes and then hit
this line:

    def do_body(self, body_bytes):
        """accept offsets for a readv request."""
        offsets = self._deserialise_offsets(body_bytes)
        backing_bytes = ''.join(bytes for offset, bytes in
            self._backing_transport.readv(self._relpath, offsets))
        return request.SuccessfulSmartServerResponse(('readv',),
backing_bytes)

And the "backing_bytes = ''.join()" is going to kill the server before
it can transmit any data back to the client.

However, even if we fixed the server, we will have a problem with the
client. It does:

        data = response_handler.read_body_bytes()

whose implementation is:

        if self._body is None:
            self._wait_for_response_end()
            body_bytes = ''.join(self._bytes_parts)
            if 'hpss' in debug.debug_flags:
                mutter('              %d body bytes read', len(body_bytes))
            self._body = StringIO(body_bytes)
            self._bytes_parts = None
        return self._body.read(count)

_wait_for_response_end() reads data from the socket until everything is
consumed and we have the "end" record.
And then we buffer all of that into a single string (after taking it
from a list), and return it.

Which means that we actually allocate at least 2x the data in memory
(because of the ''.join(list)).

So both the server and client will buffer the whole readv request.

There are a few possibilities for workarounds.

1) Teach both the client and the server how to stream data from/to the
socket.

This is, IMO, the best solution, but probably the hardest one to
implement. I think the client code wouldn't be terrible, we just need
something other than "read_body_bytes()" that we can step through
iteratively. The server side is a bit harder, but should be possible.

I'll also note, that breaking on the readv() sections isn't sufficient.
We've written a lot of code that combines small sections into larger
ones. So by the time we're actually making the request, we don't have
discrete offsets, but instead large ranges.

Which could end up being a problem for LocalTransport if we are asking
it to read 1GB at a time.

So yet one more step could be to limit how much we combine. I'm thinking
something on the order of 50MB is more than sufficent to avoid hitting
Swap, and to avoid reading 1 byte at a time.

2) Teach just the client to never make huge requests. This is easiest to
implement, but I feel it isn't really the full answer.

Thoughts?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkIovYACgkQJdeBCYSNAAMTyQCgo3ycxDvl8FIGNaXKPTlIlTjK
JoEAoKT9yQAyHObQuatIc8Fiu6ZApqLT
=3y+G
-----END PGP SIGNATURE-----