[MERGE] Protocol encoding/decoding logic for protocol v3 (protocol v3 patch 6/7)

Wed Apr 30 21:13:51 BST 2008

John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
 > I've renamed excess_data to unused_data in a few places (a fairly 
arbitrary
> change, it just seemed moreconsistent at the time).

^- unused seems a bit like data that is being thrown away, rather than 
data that hasn't been consumed yet. Though 'excess' has the same 
problem. 'remaining_buffer' is my bikeshed color.


...

+MESSAGE_VERSION_THREE = 'bzr message 3 (bzr 1.3)\n'
+RESPONSE_VERSION_THREE = REQUEST_VERSION_THREE = MESSAGE_VERSION_THREE
+

^- I thought the bzr version string would have been the current version 
of the client. Is your idea that it should be the version of bzr that 
introduced the protocol? (Like we did with disk format strings). That 
would be fine, too, but this isn't going to land before 1.5.

Why are you changing from having the 'request' and 'response' strings be 
different?


-        self.state_accept(bytes)
-        while current_state != self.state_accept:
-            current_state = self.state_accept
-            self.state_accept('')
+        self._number_needed_bytes = None
+        try:
+            self.state_accept(bytes)
+            while current_state != self.state_accept:
+                current_state = self.state_accept
+                self.state_accept('')
+        except _NeedMoreBytes, e:
+            self._number_needed_bytes = e.count

^- this seems odd. state_accept() is a function which mutates itself? 
Certainly
the fact that you keep passing no more bytes until the state stops 
changing is
a bit odd. At least, it would seem to me that passing no more bytes 
would not
change the state.

I can understand how this could be correct, but it *looks* like 
incorrect code.
At a miminum it probably needs a comment, it might indicate it should be 
done
in a different way.

...
  class ChunkedBodyDecoder(_StatefulDecoder):
@@ -310,9 +371,8 @@
      def _extract_line(self):
          pos = self._in_buffer.find('\n')
          if pos == -1:
-            # We haven't read a complete length prefix yet, so there's 
nothing
-            # to do.
-            return None
+            # We haven't read a complete line yet, so there's nothing 
to do.
+            raise _NeedMoreBytes(1)

# We haven't read a complete line yet, so request more bytes before we 
continue


Should we allow "count" to be None? It seems like it sets
"self._number_needed_bytes = None", which sounds more like not needing 
any more
bytes than an unknown number of bytes needed.


I guess as long as it is clearly documented that _number_needed_bytes = 
None
means we need to read an arbitrary amount.


...

+class SmartClientRequestProtocolOne(SmartProtocolBase, Requester,
+        message.ResponseHandler):

People are welcome to disagree, but *I* aesthetically prefer:

class SmartClientRequestProtocolOne(SmartProtocolBase, Requester,
                                     message.ResponseHandler):


+    def _raise_args_if_error(self, result_tuple):
+        v1_error_codes = [
+            'norepository',
+            'NoSuchFile',
+            'FileExists',
+            'DirectoryNotEmpty',
+            'ShortReadvError',
+            'UnicodeEncodeError',
+            'UnicodeDecodeError',
+            'ReadOnlyError',
+            'nobranch',
+            'NoSuchRevision',
+            'nosuchrevision',
+            'LockContention',
+            'UnlockableTransport',
+            'LockFailed',
+            'TokenMismatch',
+            'ReadError',
+            'PermissionDenied',
+            ]
+        if result_tuple[0] in v1_error_codes:
+            self._request.finished_reading()
+            raise errors.ErrorFromSmartServer(result_tuple)
+

^- this seems like it should have module/class scope, not scope local to 
the
function.

Also, what do you do for errors that arent V1 errors?

+        elif resp == ('ok', '3'):
+            return 3
^- this seems like a better place to have:

('ok', '3 (bzr >= 1.5)')

Certainly having a server return '3' here is going to unnecessarily 
break v2
clients. (Assuming the server can speak v2 on request.)

Capabilities seem a better handshake for this.
  I'm looking for v2or3 => I support v3

I know we were trying for completely stateless, but we've already broken 
that
contract with out "hello" requests. So we might as well do it the right 
way.

Having a handshake allows either the server or the client to be newer.

Further, I thought the goal was that the server would be expected to be 
newer.
Which doesn't work if you are changing 'hello' in a way clients won't
understand.

...

+def build_server_protocol_three(backing_transport, write_func,
+                                root_client_path):
+    request_handler = request.SmartServerRequestHandler(
+        backing_transport, commands=request.request_handlers,
+        root_client_path=root_client_path)
+    responder = ProtocolThreeResponder(write_func)
+    message_handler = 
message.ConventionalRequestHandler(request_handler, responder)
+    return ProtocolThreeDecoder(message_handler)


^- The naming here is unfortunate. As you end up passing a 
SSRequestHandler
into a CRequestHandler to get a 'message_handler'.

You could either name the variables:
server_request_handler = request.SSRH()
request_handler = message.CRH(server_request_handler, ...)

Or name the class "ConventionalMessageHandler".


...
+    def _extract_length_prefixed_bytes(self):
+        if len(self._in_buffer) < 4:
+            # A length prefix by itself is 4 bytes, and we don't even 
have that
+            # many yet.
+            raise _NeedMoreBytes(4)
+        (length,) = struct.unpack('!L', self._in_buffer[:4])

^- I personally prefere ">L" to "!L", is there a different convention 
I'm not
aware of? The ">" tells me Greater, aka Big-endian.

+        end_of_bytes = 4 + length
+        if len(self._in_buffer) < end_of_bytes:
+            # We haven't yet read as many bytes as the length-prefix 
says there
+            # are.
+            raise _NeedMoreBytes(end_of_bytes)

^- Isn't this actually 'end_of_bytes - len(self._in_buffer)' ?

+        # Extract the bytes from the buffer.
+        bytes = self._in_buffer[4:end_of_bytes]
+        self._in_buffer = self._in_buffer[end_of_bytes:]
+        return bytes
+

...

+    def _extract_prefixed_bencoded_data(self):
+        prefixed_bytes = self._extract_length_prefixed_bytes()
+        try:
+            decoded = bdecode(prefixed_bytes)
+        except ValueError:
+            raise errors.SmartProtocolError(
+                'Bytes %r not bencoded' % (prefixed_bytes,))
+        return decoded

^- Have we done at least a little bit of auditing that bencode/bdecode 
are
network transport safe? I believe they are used with bittorrent, so if 
they
weren't it would have been hacked long ago.

One thing... do we guarantee that all bytes are consumed by bdecode() 
(or an
error raised)?. I don't know that it hurts to have trailing data that is
ignored, but it is usually something that lets bad things happen.


One problem with mutating your "self.state_accept" function is that each 
one
has to handle the same "self._in_buffer += bytes". As long as that is 
the only
duplicated code, it isn't very onerous. But if it becomes more than 
that, it
seems like it would make more sense to have an "accept_bytes" function, 
which
then buffers and calls into "state_next()" or whatever.

But that is partly why having the public api be a mutating function 
seems a bit
iffy to me.

+    def _state_accept_reading_unused(self, bytes):
+        self.unused_data += bytes
+

^- Seems odd to just buffer data, but I guess if it makes the calling 
code
clearer.

+        else:
+            if self._number_needed_bytes is not None:
+                return self._number_needed_bytes - len(self._in_buffer)
+            else:
+                raise AssertionError("don't know how many bytes are 
expected!")

^- I guess it *is* "_NeedMoreBytes(length)" above, since you handle 
_in_buffer
here. But it does make it seem like raising _NeedMoreBytes() or
_NeedMoreBytes(None) should be an error. Since you will raise an 
AssertionError
later on, under those circumstances.


...

+    def _write_prefixed_bencode(self, structure):
+        bytes = bencode(structure)
+        self._write_func(struct.pack('!L', len(bytes)))
+        self._write_func(bytes)

is _write_func generally a buffered write? Otherwise I thinK:

prefix = struct.pack('!L', len(bytes))
self._write_func(prefix + bytes)

might be better.

Then again if bytes is going to be *big* it requires duplicating the 
data.
Though it is a bit ugly to do:
if len(bytes) > 1000:
     self._write_func(prefix)
     self._write_func(bytes)
else:
     self._write_func(prefix + bytes)

Maybe it would make sense to abstract that into a 
"_write_prefixed_bytes"
function. As that would be shared between "_write_prefixed_bencode()" 
and
"_write_prefixed_body". Though again you have a _write_func('b') call 
which is
a bit spurious for 1 byte of data to incur a syscall.


On the other hand, if _write_func is buffered, it doesn't necessarily 
know
where the good points to break are.


+class ProtocolThreeResponder(_ProtocolThreeEncoder):
+
+    def __init__(self, write_func):
+        _ProtocolThreeEncoder.__init__(self, write_func)
+        self.response_sent = False
+
+    def send_error(self, exception):
+        assert not self.response_sent

^- Raw assert, which will be illegal soon. and certainly should have a
user-visible message at the least.

+    def send_response(self, response):
+        assert not self.response_sent
^- same here


...
+    def call(self, *args, **kw):
+        # XXX: ideally, signature would be call(self, *args, 
headers=None), but
+        # python doesn't allow that.  So, we fake it.

well it does support (self, headers=None, *args), but I realize that 
does
things differently. Also, not an XXX, more of a Note:

+
+    def _write_headers(self, headers=None):
+        if headers is None:
+            headers = {'Software version': bzrlib.__version__}
+        self._write_prefixed_bencode(headers)
+

^- Would it make more sense to *always* write
'Software version': bzrlib.__version__, rather than only writing it if 
there
are no other headers?

Also, seeing "bzrlib" triggers my "don't access bzrlib directly" flag. 
it is
arguably more correct, but I usually use:
from bzrlib import __version__ as _bzrlib_version
for that sort of thing.


For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C20080423232335.GJ9546%40steerpike.home.puzzling.org%3E