[merge] review of overall protocol3 diff

Martin Pool mbp at canonical.com
Thu May 8 08:54:57 BST 2008


I thought I would review the overall hpss patch, to see things that
changed based on previous reviews and just as an extra check.

Some of these things would be good to do soon, but don't need to block
landing so I've marked them with ZZ.

Unfortunately I haven't finished this yet, it's very big, but here are
some comments on the
first part.

=== modified file 'bzrlib/bzrdir.py'
--- bzrlib/bzrdir.py	2008-04-14 08:03:35 +0000
+++ bzrlib/bzrdir.py	2008-05-08 05:49:30 +0000
@@ -2445,14 +2445,14 @@
             raise errors.NotBranchError(path=transport.base)
         else:
             # Decline to open it if the server doesn't support our required
-            # version (2) so that the VFS-based transport will do it.
+            # version (3) so that the VFS-based transport will do it.
             try:
                 server_version = medium.protocol_version()
             except errors.SmartProtocolError:
                 # Apparently there's no usable smart server there, even though
                 # the medium supports the smart protocol.
                 raise errors.NotBranchError(path=transport.base)
-            if server_version != 2:
+            if server_version not in (2, 3):
                 raise errors.NotBranchError(path=transport.base)
             return klass()


I think we said the other day we're going to drop this, and if the transport
can do RPCs at all, we'll just assume that.  However, that's not quite enough
for HTTP, where it's theoretically capable of doing RPC but there may not
actually be a server there...


=== modified file 'bzrlib/errors.py'
--- bzrlib/errors.py	2008-04-24 16:44:23 +0000
+++ bzrlib/errors.py	2008-05-08 05:49:30 +0000
@@ -1476,6 +1476,14 @@
         self.details = details


+class UnexpectedProtocolVersionMarker(TransportError):
+
+    _fmt = "Unexpected protocol version marker: %(marker)r"
+
+    def __init__(self, marker):
+        self.marker = marker
+
+
 class UnknownSmartMethod(InternalBzrError):

Does this mean "it's definitely a marker but we didn't expect it at this time"
or "we don't understand the protocol our counterparty is trying to
use?"  If the
latter maybe it can have a clearer message.


     _fmt = "The server does not recognise the '%(verb)s' request."
@@ -2430,6 +2438,16 @@
         self.response_tuple = response_tuple


@@ -118,7 +114,12 @@
     def get_branch_reference(self):
         """See BzrDir.get_branch_reference()."""
         path = self._path_for_remote_call(self._client)
-        response = self._client.call('BzrDir.open_branch', path)
+        try:
+            response = self._client.call('BzrDir.open_branch', path)
+        except errors.ErrorFromSmartServer, err:
+            if err.error_tuple == ('nobranch',):
+                raise errors.NotBranchError(path=self.root_transport.base)
+            raise
         if response[0] == 'ok':
             if response[1] == '':
                 # branch at this location.

This is a nice change, that these errors can now all be treated as
client side errors by default.

ZZ: I do wonder though whether we should take this a step further:
isn't it always appropriate to turn this into NotBranchError in e.g.
_raise_args_if_error?

@@ -126,8 +127,6 @@
             else:
                 # a branch reference, use the existing BranchReference logic.
                 return response[1]
-        elif response == ('nobranch',):
-            raise errors.NotBranchError(path=self.root_transport.base)
         else:
             raise errors.UnexpectedSmartServerResponse(response)

@@ -150,14 +149,17 @@
         path = self._path_for_remote_call(self._client)
         verb = 'BzrDir.find_repositoryV2'
         try:
-            response = self._client.call(verb, path)
-        except errors.UnknownSmartMethod:
-            verb = 'BzrDir.find_repository'
-            response = self._client.call(verb, path)
-        assert response[0] in ('ok', 'norepository'), \
-            'unexpected response code %s' % (response,)
-        if response[0] == 'norepository':
-            raise errors.NoRepositoryPresent(self)
+            try:
+                response = self._client.call(verb, path)
+            except errors.UnknownSmartMethod:
+                verb = 'BzrDir.find_repository'
+                response = self._client.call(verb, path)
+        except errors.ErrorFromSmartServer, err:
+            if err.error_tuple[0] == 'norepository':
+                raise errors.NoRepositoryPresent(self)
+            raise
+        if response[0] != 'ok':
+            raise errors.UnexpectedSmartServerResponse(response)
         if verb == 'BzrDir.find_repository':
             # servers that don't support the V2 method don't support external
             # references either.

ZZ: Again, it seems like many commands repeat this check for 'ok' and
raising an error
otherwise, surely it can be factored out?

@@ -372,26 +374,25 @@

         path = self.bzrdir._path_for_remote_call(self._client)
         assert type(revision_id) is str
-        response = self._client.call_expecting_body(
-            'Repository.get_revision_graph', path, revision_id)
-        if response[0][0] not in ['ok', 'nosuchrevision']:
+        try:
+            response = self._client.call_expecting_body(
+                'Repository.get_revision_graph', path, revision_id)
+        except errors.ErrorFromSmartServer, err:
+            if err.error_tuple[0] == 'nosuchrevision':
+                raise NoSuchRevision(self, revision_id)
+        if response[0][0] != 'ok':

You are swallowing the error if it is not nosuchrevision.  Rather than
getting every one
of these methods right (or wrong) individually I'd really like to centralize it.

             raise errors.UnexpectedSmartServerResponse(response[0])
-        if response[0][0] == 'ok':
-            coded = response[1].read_body_bytes()
-            if coded == '':
-                # no revisions in this repository!
-                return {}
-            lines = coded.split('\n')
-            revision_graph = {}
-            for line in lines:
-                d = tuple(line.split())
-                revision_graph[d[0]] = d[1:]
-
-            return revision_graph
-        else:
-            response_body = response[1].read_body_bytes()
-            assert response_body == ''
-            raise NoSuchRevision(self, revision_id)
+        coded = response[1].read_body_bytes()
+        if coded == '':
+            # no revisions in this repository!
+            return {}
+        lines = coded.split('\n')
+        revision_graph = {}
+        for line in lines:
+            d = tuple(line.split())
+            revision_graph[d[0]] = d[1:]
+
+        return revision_graph

Just as a point of style (not for now) I think code like

   coded = response[1].read_body_bytes()

would be much better as

   status, response = blah._call()
   ....
   coded = response.read()

in other words avoiding subscripting on tuples that can sensibly be given names.

@@ -1519,10 +1534,11 @@
             self._ensure_real()
             self._clear_cached_state()
             return self._real_branch.set_last_revision_info(revno, revision_id)
+        except errors.ErrorFromSmartServer, err:
+            if err.error_tuple[0] == 'NoSuchRevision':
+                raise NoSuchRevision(self, err.error_tuple[1])

And here.

+    def _call_and_read_response(self, method, args, body=None, readv_body=None,
+            expect_response_body=True):
+        if self._protocol_version is not None:
+            response_handler = self._send_request(
+                self._protocol_version, method, args, body=body,
+                readv_body=readv_body)
+            return (response_handler.read_response_tuple(
+                        expect_body=expect_response_body),
+                    response_handler)
+        else:
+            for protocol_version in [3, 2]:
+                response_handler = self._send_request(
+                    protocol_version, method, args, body=body,
+                    readv_body=readv_body)
+                try:
+                    response_tuple = response_handler.read_response_tuple(
+                        expect_body=expect_response_body)
+                except errors.UnexpectedProtocolVersionMarker, err:
+                    # TODO: We could recover from this without disconnecting if
+                    # we recognise the protocol version.
+                    self._medium.disconnect()
+                    continue
+                else:
+                    self._protocol_version = protocol_version
+                    return response_tuple, response_handler
+            raise errors.SmartProtocolError(
+                'Server is not a Bazaar server: ' + str(err))

We should warn that the server is old so we're reconnecting.

@@ -114,16 +148,11 @@

         :returns: a SmartServerRequestProtocol.
         """
-        # Identify the protocol version.
         bytes = self._get_line()
-        if bytes.startswith(REQUEST_VERSION_TWO):
-            protocol_class = SmartServerRequestProtocolTwo
-            bytes = bytes[len(REQUEST_VERSION_TWO):]
-        else:
-            protocol_class = SmartServerRequestProtocolOne
-        protocol = protocol_class(
+        protocol_factory, unused_bytes = _get_protocol_factory_for_bytes(bytes)
+        protocol = protocol_factory(
             self.backing_transport, self._write_out, self.root_client_path)
-        protocol.accept_bytes(bytes)
+        protocol.accept_bytes(unused_bytes)
         return protocol

I like the patterns with accept_bytes here...

@@ -202,9 +231,6 @@
         return self.socket.recv(4096)

     def terminate_due_to_error(self):
-        """Called when an unhandled exception from the protocol occurs."""
-        # TODO: This should log to a server log file, but no such thing
-        # exists yet.  Andrew Bennetts 2006-09-29.
         self.socket.close()
         self.finished = True

Is this done now?  Why remove the comment?

=== added file 'bzrlib/smart/message.py'
--- bzrlib/smart/message.py	1970-01-01 00:00:00 +0000
+++ bzrlib/smart/message.py	2008-05-08 05:49:31 +0000

+class ConventionalRequestHandler(MessageHandler):
+    """A message handler for "conventional" requests.
+
+    "Conventional" is used in the sense described in
+    doc/developers/network-protocol.txt: a simple message with arguments and an
+    optional body.
+    """
+
+    def __init__(self, request_handler, responder):
+        MessageHandler.__init__(self)
+        self.request_handler = request_handler
+        self.responder = responder
+        self.args_received = False

There is some naming confusion about a class called
ConventionalRequestHandler that has a
member called request_handler that is not of the same class.  Maybe
this should be called
ConventionalMessageHandler?

Since the messages are symmetric it would be nice to unify these
classes.  Or at least
have a common base class that does all the network-facing stuff.

+
+    def protocol_error(self, exception):
+        self.responder.send_error(exception)
+
+    def byte_part_received(self, byte):
+        raise errors.SmartProtocolError(
+            'Unexpected message part: byte(%r)' % (byte,))

I'm confused about why this is not accepting byte parts, shouldn't it
take 'E', 'S', etc,
similar to the ResponseHandler?

+
+    def structure_part_received(self, structure):
+        if self.args_received:
+            raise errors.SmartProtocolError(
+                'Unexpected message part: structure(%r)' % (structure,))
+        self.args_received = True
+        self.request_handler.dispatch_command(structure[0], structure[1:])
+        if self.request_handler.finished_reading:
+            self.responder.send_response(self.request_handler.response)
+
+    def bytes_part_received(self, bytes):
+        # Note that there's no intrinsic way to distinguish a monolithic body
+        # from a chunk stream.  A request handler knows which it is expecting
+        # (once the args have been received), so it should be able to do the
+        # right thing.
+        self.request_handler.accept_body(bytes)
+        self.request_handler.end_of_body()
+        if not self.request_handler.finished_reading:
+            raise SmartProtocolError(
+                "Conventional request body was received, but request handler "
+                "has not finished reading.")
+        self.responder.send_response(self.request_handler.response)

I don't understand how a streamed body fits in here.  Wouldn't that be
multiple bytes parts?

It would be nice to add that now so that we can stream requests without
worrying whether the server is so old that it will reject them, and without it
needing to use different RequestHandlers for different verbs.

Surely this should be split so that it keeps just giving bits of the body to
the request_handler as they come in, then only when it gets the end of the
message it says it's finished?

Now it's true we don't use this at the moment, but I'm concerned if we don't
fix this before landing this protocol it will be another case of causing
problem with forward compatibility.

+    def read_response_tuple(self, expect_body=False):
+        """Reads and returns the response tuple for the current request.
+
+        :keyword expect_body: a boolean indicating if a body is expected in the
+            response.  Some protocol versions needs this information to know
+            when a response is finished.  If False, read_body_bytes should
+            *not* be called afterwards.  Defaults to False.
+        :returns: tuple of response arguments.
+        """
+        raise NotImplementedError(self.read_response_tuple)
+
+    def read_body_bytes(self, count=-1):
+        """Read and return some bytes from the body.
+
+        :param count: if specified, read up to this many bytes.  By default,
+            reads the entire body.
+        :returns: str of bytes from the response body.
+        """
+        raise NotImplementedError(self.read_body_bytes)
+
+    def read_streamed_body(self):
+        """Returns an iterable that reads and returns a series of body chunks.
+        """
+        raise NotImplementedError(self.read_streamed_body)
+
+    def cancel_read_body(self):
+        """Stop expecting a body for this response.
+
+        If expect_body was passed to read_response_tuple, this cancels that
+        expectation (and thus finishes reading the response, allowing a new
+        request to be issued).  This is useful if a response turns out to be an
+        error rather than a normal result with a body.
+        """
+        raise NotImplementedError(self.cancel_read_body)
+
+
+class ConventionalResponseHandler(MessageHandler, ResponseHandler):

In general we have a bias against multiple inheritance.  I know here you're
just inheriting from an ABC.  Since there is only one concrete implementation
I don't think the interface-class really deserves to live.  (I know other
projects do like them more than we do.)  I would probably just move the
docstrings to the implementations.

+
+    def bytes_part_received(self, bytes):
+        self._body_started = True
+        self._bytes_parts.append(bytes)

should check here we already have the args?

+def _translate_error(error_tuple):
+    # Many exceptions need some state from the requestor to be properly
+    # translated (e.g. they need a branch object).  So this only translates a
+    # few errors, and the rest are turned into a generic ErrorFromSmartServer.

Ah, I see what you mean.

One option would be to raise them without real contents and let the Remote*
object catch them and poke in the right value.  So then if it doesn't
explicitly handle them, at least we'll get something approximately right...

+    error_name = error_tuple[0]
+    error_args = error_tuple[1:]
+    if error_name == 'UnknownMethod':
+        raise errors.UnknownSmartMethod(error_args[0])
+    if error_name == 'LockContention':
+        raise errors.LockContention('(remote lock)')
+    elif error_name == 'LockFailed':
+        raise errors.LockFailed(*error_args[:2])
+    else:
+        raise errors.ErrorFromSmartServer(error_tuple)

@@ -76,7 +116,7 @@
     def __init__(self, backing_transport, write_func, root_client_path='/'):
         self._backing_transport = backing_transport
         self._root_client_path = root_client_path
-        self.excess_buffer = ''
+        self.unused_data = ''
         self._finished = False
         self.in_buffer = ''
         self.has_dispatched = False

Those should probably be private?

@@ -616,6 +705,8 @@
             return 1
         elif resp == ('ok', '2'):
             return 2
+        elif resp == ('ok', '3'):
+            return 3
         else:
             raise errors.SmartProtocolError("bad response %r" % (resp,))

This also needs to be reverted I think.

-- Martin



More information about the bazaar mailing list