[MERGE] Protocol encoding/decoding logic for protocol v3 (protocol v3 patch 6/7)
John Arbash Meinel
john at arbash-meinel.com
Wed Apr 30 21:13:51 BST 2008
John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
> I've renamed excess_data to unused_data in a few places (a fairly
arbitrary
> change, it just seemed moreconsistent at the time).
^- unused seems a bit like data that is being thrown away, rather than
data that hasn't been consumed yet. Though 'excess' has the same
problem. 'remaining_buffer' is my bikeshed color.
...
+MESSAGE_VERSION_THREE = 'bzr message 3 (bzr 1.3)\n'
+RESPONSE_VERSION_THREE = REQUEST_VERSION_THREE = MESSAGE_VERSION_THREE
+
^- I thought the bzr version string would have been the current version
of the client. Is your idea that it should be the version of bzr that
introduced the protocol? (Like we did with disk format strings). That
would be fine, too, but this isn't going to land before 1.5.
Why are you changing from having the 'request' and 'response' strings be
different?
- self.state_accept(bytes)
- while current_state != self.state_accept:
- current_state = self.state_accept
- self.state_accept('')
+ self._number_needed_bytes = None
+ try:
+ self.state_accept(bytes)
+ while current_state != self.state_accept:
+ current_state = self.state_accept
+ self.state_accept('')
+ except _NeedMoreBytes, e:
+ self._number_needed_bytes = e.count
^- this seems odd. state_accept() is a function which mutates itself?
Certainly
the fact that you keep passing no more bytes until the state stops
changing is
a bit odd. At least, it would seem to me that passing no more bytes
would not
change the state.
I can understand how this could be correct, but it *looks* like
incorrect code.
At a miminum it probably needs a comment, it might indicate it should be
done
in a different way.
...
class ChunkedBodyDecoder(_StatefulDecoder):
@@ -310,9 +371,8 @@
def _extract_line(self):
pos = self._in_buffer.find('\n')
if pos == -1:
- # We haven't read a complete length prefix yet, so there's
nothing
- # to do.
- return None
+ # We haven't read a complete line yet, so there's nothing
to do.
+ raise _NeedMoreBytes(1)
# We haven't read a complete line yet, so request more bytes before we
continue
Should we allow "count" to be None? It seems like it sets
"self._number_needed_bytes = None", which sounds more like not needing
any more
bytes than an unknown number of bytes needed.
I guess as long as it is clearly documented that _number_needed_bytes =
None
means we need to read an arbitrary amount.
...
+class SmartClientRequestProtocolOne(SmartProtocolBase, Requester,
+ message.ResponseHandler):
People are welcome to disagree, but *I* aesthetically prefer:
class SmartClientRequestProtocolOne(SmartProtocolBase, Requester,
message.ResponseHandler):
+ def _raise_args_if_error(self, result_tuple):
+ v1_error_codes = [
+ 'norepository',
+ 'NoSuchFile',
+ 'FileExists',
+ 'DirectoryNotEmpty',
+ 'ShortReadvError',
+ 'UnicodeEncodeError',
+ 'UnicodeDecodeError',
+ 'ReadOnlyError',
+ 'nobranch',
+ 'NoSuchRevision',
+ 'nosuchrevision',
+ 'LockContention',
+ 'UnlockableTransport',
+ 'LockFailed',
+ 'TokenMismatch',
+ 'ReadError',
+ 'PermissionDenied',
+ ]
+ if result_tuple[0] in v1_error_codes:
+ self._request.finished_reading()
+ raise errors.ErrorFromSmartServer(result_tuple)
+
^- this seems like it should have module/class scope, not scope local to
the
function.
Also, what do you do for errors that arent V1 errors?
+ elif resp == ('ok', '3'):
+ return 3
^- this seems like a better place to have:
('ok', '3 (bzr >= 1.5)')
Certainly having a server return '3' here is going to unnecessarily
break v2
clients. (Assuming the server can speak v2 on request.)
Capabilities seem a better handshake for this.
I'm looking for v2or3 => I support v3
I know we were trying for completely stateless, but we've already broken
that
contract with out "hello" requests. So we might as well do it the right
way.
Having a handshake allows either the server or the client to be newer.
Further, I thought the goal was that the server would be expected to be
newer.
Which doesn't work if you are changing 'hello' in a way clients won't
understand.
...
+def build_server_protocol_three(backing_transport, write_func,
+ root_client_path):
+ request_handler = request.SmartServerRequestHandler(
+ backing_transport, commands=request.request_handlers,
+ root_client_path=root_client_path)
+ responder = ProtocolThreeResponder(write_func)
+ message_handler =
message.ConventionalRequestHandler(request_handler, responder)
+ return ProtocolThreeDecoder(message_handler)
^- The naming here is unfortunate. As you end up passing a
SSRequestHandler
into a CRequestHandler to get a 'message_handler'.
You could either name the variables:
server_request_handler = request.SSRH()
request_handler = message.CRH(server_request_handler, ...)
Or name the class "ConventionalMessageHandler".
...
+ def _extract_length_prefixed_bytes(self):
+ if len(self._in_buffer) < 4:
+ # A length prefix by itself is 4 bytes, and we don't even
have that
+ # many yet.
+ raise _NeedMoreBytes(4)
+ (length,) = struct.unpack('!L', self._in_buffer[:4])
^- I personally prefere ">L" to "!L", is there a different convention
I'm not
aware of? The ">" tells me Greater, aka Big-endian.
+ end_of_bytes = 4 + length
+ if len(self._in_buffer) < end_of_bytes:
+ # We haven't yet read as many bytes as the length-prefix
says there
+ # are.
+ raise _NeedMoreBytes(end_of_bytes)
^- Isn't this actually 'end_of_bytes - len(self._in_buffer)' ?
+ # Extract the bytes from the buffer.
+ bytes = self._in_buffer[4:end_of_bytes]
+ self._in_buffer = self._in_buffer[end_of_bytes:]
+ return bytes
+
...
+ def _extract_prefixed_bencoded_data(self):
+ prefixed_bytes = self._extract_length_prefixed_bytes()
+ try:
+ decoded = bdecode(prefixed_bytes)
+ except ValueError:
+ raise errors.SmartProtocolError(
+ 'Bytes %r not bencoded' % (prefixed_bytes,))
+ return decoded
^- Have we done at least a little bit of auditing that bencode/bdecode
are
network transport safe? I believe they are used with bittorrent, so if
they
weren't it would have been hacked long ago.
One thing... do we guarantee that all bytes are consumed by bdecode()
(or an
error raised)?. I don't know that it hurts to have trailing data that is
ignored, but it is usually something that lets bad things happen.
One problem with mutating your "self.state_accept" function is that each
one
has to handle the same "self._in_buffer += bytes". As long as that is
the only
duplicated code, it isn't very onerous. But if it becomes more than
that, it
seems like it would make more sense to have an "accept_bytes" function,
which
then buffers and calls into "state_next()" or whatever.
But that is partly why having the public api be a mutating function
seems a bit
iffy to me.
+ def _state_accept_reading_unused(self, bytes):
+ self.unused_data += bytes
+
^- Seems odd to just buffer data, but I guess if it makes the calling
code
clearer.
+ else:
+ if self._number_needed_bytes is not None:
+ return self._number_needed_bytes - len(self._in_buffer)
+ else:
+ raise AssertionError("don't know how many bytes are
expected!")
^- I guess it *is* "_NeedMoreBytes(length)" above, since you handle
_in_buffer
here. But it does make it seem like raising _NeedMoreBytes() or
_NeedMoreBytes(None) should be an error. Since you will raise an
AssertionError
later on, under those circumstances.
...
+ def _write_prefixed_bencode(self, structure):
+ bytes = bencode(structure)
+ self._write_func(struct.pack('!L', len(bytes)))
+ self._write_func(bytes)
is _write_func generally a buffered write? Otherwise I thinK:
prefix = struct.pack('!L', len(bytes))
self._write_func(prefix + bytes)
might be better.
Then again if bytes is going to be *big* it requires duplicating the
data.
Though it is a bit ugly to do:
if len(bytes) > 1000:
self._write_func(prefix)
self._write_func(bytes)
else:
self._write_func(prefix + bytes)
Maybe it would make sense to abstract that into a
"_write_prefixed_bytes"
function. As that would be shared between "_write_prefixed_bencode()"
and
"_write_prefixed_body". Though again you have a _write_func('b') call
which is
a bit spurious for 1 byte of data to incur a syscall.
On the other hand, if _write_func is buffered, it doesn't necessarily
know
where the good points to break are.
+class ProtocolThreeResponder(_ProtocolThreeEncoder):
+
+ def __init__(self, write_func):
+ _ProtocolThreeEncoder.__init__(self, write_func)
+ self.response_sent = False
+
+ def send_error(self, exception):
+ assert not self.response_sent
^- Raw assert, which will be illegal soon. and certainly should have a
user-visible message at the least.
+ def send_response(self, response):
+ assert not self.response_sent
^- same here
...
+ def call(self, *args, **kw):
+ # XXX: ideally, signature would be call(self, *args,
headers=None), but
+ # python doesn't allow that. So, we fake it.
well it does support (self, headers=None, *args), but I realize that
does
things differently. Also, not an XXX, more of a Note:
+
+ def _write_headers(self, headers=None):
+ if headers is None:
+ headers = {'Software version': bzrlib.__version__}
+ self._write_prefixed_bencode(headers)
+
^- Would it make more sense to *always* write
'Software version': bzrlib.__version__, rather than only writing it if
there
are no other headers?
Also, seeing "bzrlib" triggers my "don't access bzrlib directly" flag.
it is
arguably more correct, but I usually use:
from bzrlib import __version__ as _bzrlib_version
for that sort of thing.
For details, see:
http://bundlebuggy.aaronbentley.com/request/%3C20080423232335.GJ9546%40steerpike.home.puzzling.org%3E
More information about the bazaar
mailing list