[MERGE] Protocol encoding/decoding logic for protocol v3 (protocol v3 patch 6/7)

Thu May 15 09:16:49 BST 2008

John Arbash Meinel wrote:
> John Arbash Meinel has voted tweak.
> Status is now: Conditionally approved
> Comment:
> > I've renamed excess_data to unused_data in a few places (a fairly  
> arbitrary
>> change, it just seemed moreconsistent at the time).
>
> ^- unused seems a bit like data that is being thrown away, rather than  
> data that hasn't been consumed yet. Though 'excess' has the same  
> problem. 'remaining_buffer' is my bikeshed color.

Heh.  These are all much of a muchness to me, so I think I'll just leave the
bikeshed's current coat of paint alone.

> ...
>
> +MESSAGE_VERSION_THREE = 'bzr message 3 (bzr 1.3)\n'
> +RESPONSE_VERSION_THREE = REQUEST_VERSION_THREE = MESSAGE_VERSION_THREE
> +
>
> ^- I thought the bzr version string would have been the current version  
> of the client. Is your idea that it should be the version of bzr that  
> introduced the protocol? (Like we did with disk format strings). That  
> would be fine, too, but this isn't going to land before 1.5.
>
> Why are you changing from having the 'request' and 'response' strings be  
> different?

The idea is like the disk formats, yes.  That way error messages users see are
likely to be a little more helpful when mixing versions of bzr that are too
old/new for each other.  Obviously I'll be changing “1.3” to “1.6” now...

The reason for using the same string for requests and responses is simply that
it didn't add any value.  The network protocol was pretty symmetric already, and
v3 has made it even more so.

> -        self.state_accept(bytes)
> -        while current_state != self.state_accept:
> -            current_state = self.state_accept
> -            self.state_accept('')
> +        self._number_needed_bytes = None
> +        try:
> +            self.state_accept(bytes)
> +            while current_state != self.state_accept:
> +                current_state = self.state_accept
> +                self.state_accept('')
> +        except _NeedMoreBytes, e:
> +            self._number_needed_bytes = e.count
>
> ^- this seems odd. state_accept() is a function which mutates itself?  

Yes.  It's an instance variable for the function to run for the current state.
When the state changes, that's done by changing this variable to point to a new
function.  accept_bytes will keep calling state_accept until it stops changing
so that it the state machine is advanced as far as possible.

> Certainly the fact that you keep passing no more bytes until the state stops
> changing is a bit odd. At least, it would seem to me that passing no more
> bytes  would not change the state.
>
> I can understand how this could be correct, but it *looks* like  incorrect
> code.  At a miminum it probably needs a comment, it might indicate it should
> be  done in a different way.

Passing no more bytes makes sense because already received bytes will be
buffered.  The state_accept function isn't restricted to just looking at the new
bytes received, it can (and should) look at e.g. a self._in_buffer variable of
bytes already received.

That said, I've reworked it a little so that state_accept now takes no args,
instead "self._in_buffer += bytes" happens in accept_bytes.  The reason I didn't
do this originally was that the _StatefulDecoder subclasses we already had, like
LengthPrefixedBodyDecoder, needed significant changes to work with this.  I've
bitten the bullet and done it.

> ...
>  class ChunkedBodyDecoder(_StatefulDecoder):
> @@ -310,9 +371,8 @@
>      def _extract_line(self):
>          pos = self._in_buffer.find('\n')
>          if pos == -1:
> -            # We haven't read a complete length prefix yet, so there's  
> nothing
> -            # to do.
> -            return None
> +            # We haven't read a complete line yet, so there's nothing  
> to do.
> +            raise _NeedMoreBytes(1)
>
> # We haven't read a complete line yet, so request more bytes before we  
> continue
>
>
> Should we allow "count" to be None? It seems like it sets
> "self._number_needed_bytes = None", which sounds more like not needing any
> more bytes than an unknown number of bytes needed.
>
> I guess as long as it is clearly documented that _number_needed_bytes = None
> means we need to read an arbitrary amount.

Well, _number_needed_bytes = 0 is an unambiguous way to say “no more bytes
needed”, so using None for “unknown” makes sense to me.  I've expanded the
docs on _NeedMoreBytes to say explicitly that None means “Unknown”.

> +class SmartClientRequestProtocolOne(SmartProtocolBase, Requester,
> +        message.ResponseHandler):
>
> People are welcome to disagree, but *I* aesthetically prefer:
>
> class SmartClientRequestProtocolOne(SmartProtocolBase, Requester,
>                                     message.ResponseHandler):

Well, I think they're all ugly :)

So I'm happy to change it to the variant you prefer.

> +    def _raise_args_if_error(self, result_tuple):
> +        v1_error_codes = [
> +            'norepository',
> +            'NoSuchFile',
> +            'FileExists',
> +            'DirectoryNotEmpty',
> +            'ShortReadvError',
> +            'UnicodeEncodeError',
> +            'UnicodeDecodeError',
> +            'ReadOnlyError',
> +            'nobranch',
> +            'NoSuchRevision',
> +            'nosuchrevision',
> +            'LockContention',
> +            'UnlockableTransport',
> +            'LockFailed',
> +            'TokenMismatch',
> +            'ReadError',
> +            'PermissionDenied',
> +            ]
> +        if result_tuple[0] in v1_error_codes:
> +            self._request.finished_reading()
> +            raise errors.ErrorFromSmartServer(result_tuple)
> +
>
> ^- this seems like it should have module/class scope, not scope local to  
> the
> function.
>
> Also, what do you do for errors that arent V1 errors?

Well, earlier V1 clients aren't going to cope with them anyway.  So we should
avoid returning new errors on existing functions.

I've added a comment:

        # Later protocol versions have an explicit flag in the protocol to say
        # if an error response is "failed" or not.  In version 1 we don't have
        # that luxury.  So here is a complete list of errors that can be
        # returned in response to existing version 1 smart requests.  Responses
        # starting with these codes are always "failed" responses.

> +        elif resp == ('ok', '3'):
> +            return 3
> ^- this seems like a better place to have:
>
> ('ok', '3 (bzr >= 1.5)')
[...snip...]

This has been superseded by the new protocol auto-detection logic.

[...]
> Further, I thought the goal was that the server would be expected to be
> newer.  Which doesn't work if you are changing 'hello' in a way clients won't
> understand.

(I just want to point out that while it would be nice if we could assume this,
in practice I think clients will tend to be upgraded before servers most of the
time.)

> +def build_server_protocol_three(backing_transport, write_func,
> +                                root_client_path):
> +    request_handler = request.SmartServerRequestHandler(
> +        backing_transport, commands=request.request_handlers,
> +        root_client_path=root_client_path)
> +    responder = ProtocolThreeResponder(write_func)
> +    message_handler =  
> message.ConventionalRequestHandler(request_handler, responder)
> +    return ProtocolThreeDecoder(message_handler)
>
>
> ^- The naming here is unfortunate. As you end up passing a  
> SSRequestHandler
> into a CRequestHandler to get a 'message_handler'.

Yes, it is.  I think it's because SmartServerRequestHandler is poorly named.
Its purpose is basically to dispatch a request to the right SmartServerRequest
class.  I was tempted to rename SmartServerRequestHandler to
"SmartServerRequestDispatcher", but again I didn't want to inflate this patch
more than necessary.

> You could either name the variables:
> server_request_handler = request.SSRH()
> request_handler = message.CRH(server_request_handler, ...)

I don't find these names any less confusing.  In fact it'd be worse, a request
handler is necessarily a server-side thing, so it's effectively calling them
"request_handler1" and "request_handler2".

"message_handler" at least gives an indication of which layer this object is
used at.

> Or name the class "ConventionalMessageHandler".

And this would also be very confusing to me, as this name is equally applicable
to the ConventionalResponseHandler class.

So for now I'll leave this as is.  I agree there's room for improvement, though.

> ...
> +    def _extract_length_prefixed_bytes(self):
> +        if len(self._in_buffer) < 4:
> +            # A length prefix by itself is 4 bytes, and we don't even  
> have that
> +            # many yet.
> +            raise _NeedMoreBytes(4)
> +        (length,) = struct.unpack('!L', self._in_buffer[:4])
>
> ^- I personally prefere ">L" to "!L", is there a different convention  
> I'm not
> aware of? The ">" tells me Greater, aka Big-endian.

“!” tells me Network.  They're all just line-noise to me.  I only use them
occasionally so I have to look them up in
<file:///usr/share/doc/python2.5/html/lib/module-struct.html> every time.  I
chose the one labelled “network” rather than “big-endian” because I felt it
communicates intent slightly better.

> +        end_of_bytes = 4 + length
> +        if len(self._in_buffer) < end_of_bytes:
> +            # We haven't yet read as many bytes as the length-prefix  
> says there
> +            # are.
> +            raise _NeedMoreBytes(end_of_bytes)
>
> ^- Isn't this actually 'end_of_bytes - len(self._in_buffer)' ?

No, although I can see how that's confusing.  The exception is called
"_NeedMoreBytes" but the parameter is the total bytes needed, including those
already in the buffer.  The docstring I just added a to _NeedMoreBytes a moment
ago is explicit about this.

The advantage to doing it this way is that I don't have to repeat the "nnn -
len(self._in_buffer)" expression everywhere.

> +    def _extract_prefixed_bencoded_data(self):
> +        prefixed_bytes = self._extract_length_prefixed_bytes()
> +        try:
> +            decoded = bdecode(prefixed_bytes)
> +        except ValueError:
> +            raise errors.SmartProtocolError(
> +                'Bytes %r not bencoded' % (prefixed_bytes,))
> +        return decoded
>
> ^- Have we done at least a little bit of auditing that bencode/bdecode  are
> network transport safe? I believe they are used with bittorrent, so if  they
> weren't it would have been hacked long ago.

They are sane, yeah.

> One thing... do we guarantee that all bytes are consumed by bdecode()  (or an
> error raised)?. I don't know that it hurts to have trailing data that is
> ignored, but it is usually something that lets bad things happen.

bdecode raises an exception if it encounters trailing data, which is a good
thing.

> One problem with mutating your "self.state_accept" function is that each  one
> has to handle the same "self._in_buffer += bytes". As long as that is  the
> only duplicated code, it isn't very onerous. But if it becomes more than
> that, it seems like it would make more sense to have an "accept_bytes"
> function,  which then buffers and calls into "state_next()" or whatever.
>
> But that is partly why having the public api be a mutating function  seems a
> bit iffy to me.

As mentioned above, I've managed to put “self._in_buffer += bytes” in
accept_bytes, rather than duplicated everywhere now.  (That doesn't affect the
use of an instance variable that holds the next function to call, though.)

> +    def _state_accept_reading_unused(self, bytes):
> +        self.unused_data += bytes
> +
>
> ^- Seems odd to just buffer data, but I guess if it makes the calling code
> clearer.

Right.  The use of this is if bytes for more than a single request are received
at once, and passed to accept_bytes in a single call, then we should cope with
that.  (In fact, thanks to this I expect our smart servers have always been
theoretically capable of dealing with pipelined requests, although we've never
tried to take advantage of that.)

It also forces the code to be explicit about message boundaries, and how it
handles them, which is a good thing.

> +        else:
> +            if self._number_needed_bytes is not None:
> +                return self._number_needed_bytes - len(self._in_buffer)
> +            else:
> +                raise AssertionError("don't know how many bytes are  
> expected!")
>
> ^- I guess it *is* "_NeedMoreBytes(length)" above, since you handle  
> _in_buffer
> here. But it does make it seem like raising _NeedMoreBytes() or
> _NeedMoreBytes(None) should be an error. Since you will raise an  
> AssertionError
> later on, under those circumstances.

This class isn't the only _StatefulDecoder subclass.  Other classes use
different methods to calculate next_read_size, and so it's for them that
_NeedMoreBytes can accept None.

> +    def _write_prefixed_bencode(self, structure):
> +        bytes = bencode(structure)
> +        self._write_func(struct.pack('!L', len(bytes)))
> +        self._write_func(bytes)
>
> is _write_func generally a buffered write? Otherwise I thinK:
>
> prefix = struct.pack('!L', len(bytes))
> self._write_func(prefix + bytes)
>
> might be better.

IIRC, it's typically going to be fileobject.write or socket.sendall.  What that
does exactly is largely up to the OS.

You're right, it might be better.  But it might not.  And it might not matter
much either way.  So I think we ought to leave it simple, and wait until
profiling tells us it matters before we worry about it.

> Then again if bytes is going to be *big* it requires duplicating the  
> data.
> Though it is a bit ugly to do:
> if len(bytes) > 1000:
>     self._write_func(prefix)
>     self._write_func(bytes)
> else:
>     self._write_func(prefix + bytes)
>
> Maybe it would make sense to abstract that into a "_write_prefixed_bytes"
> function. As that would be shared between "_write_prefixed_bencode()" and
> "_write_prefixed_body". Though again you have a _write_func('b') call which is
> a bit spurious for 1 byte of data to incur a syscall.
>
> On the other hand, if _write_func is buffered, it doesn't necessarily know
> where the good points to break are.

If we want to add buffering between the encoder and OS, we could just provide a
different _write_func that buffers.  Although the point at which we flush the
buffers is an interesting question, as you mention.  I'm comfortable that the
existing strategy should perform ok (v1 and v2 do similar things), and it has
the advantage of being simple and relatively friendly with memory consumption.

For better or worse, we aren't yet at the point that the rate this code can send
bytes down the wire is our bottleneck...

> +class ProtocolThreeResponder(_ProtocolThreeEncoder):
> +
> +    def __init__(self, write_func):
> +        _ProtocolThreeEncoder.__init__(self, write_func)
> +        self.response_sent = False
> +
> +    def send_error(self, exception):
> +        assert not self.response_sent
>
> ^- Raw assert, which will be illegal soon. and certainly should have a
> user-visible message at the least.
>
> +    def send_response(self, response):
> +        assert not self.response_sent
> ^- same here

Fixed.

> ...
> +    def call(self, *args, **kw):
> +        # XXX: ideally, signature would be call(self, *args,  
> headers=None), but
> +        # python doesn't allow that.  So, we fake it.
>
> well it does support (self, headers=None, *args), but I realize that  
> does
> things differently. Also, not an XXX, more of a Note:

Happily, changes prompted by an earlier review + pairing with Martin have made
this irrelevant :)

> +    def _write_headers(self, headers=None):
> +        if headers is None:
> +            headers = {'Software version': bzrlib.__version__}
> +        self._write_prefixed_bencode(headers)
> +
>
> ^- Would it make more sense to *always* write
> 'Software version': bzrlib.__version__, rather than only writing it if  
> there
> are no other headers?

Ditto.

Thanks once again for the review.  The code is better for it.

-Andrew.