[MERGE] Return (and expect) '3' rather than '2' from the 'hello' verb (protocol v3 patch 4/7)

Thu May 8 01:34:14 BST 2008

John Arbash Meinel wrote:
> Andrew Bennetts wrote:
[...]
> | It implements detection the server protocol version a better way: the first time
> | a request is made, it simply tries v3, then it tries v2, then it gives up.
> | After that, it remembers the version it detected.  'hello' is no longer used by
> | this client.
> |
>
> My concern with this is the problem we had with the "call_with_body_bytes" for
> unknown protocols. Where calling the server with an unknown method made it start
> doing bad things. We know disconnect and reconnect, but that is still somewhat
> unsafe. Considering the server still things you made multiple requests, and is
> still failing, we are just hiding that from the client.
>
> I guess V3 is a much bigger change from V2, or even the V2 extra-body stuff. So
> if V3 fails, it will fail right away, in a predictable manner for any call that
> we might make. If it doesn't, then I think we need to reconsider how we want to
> do it.

It's not the same problem.  Since V2, we always send a newline-terminated
protocol version marker before a request or response.  So the first bytes the
recipient in either direction will read will immediately identify the message as
a known version, or as unintelligible garbage.  If it's garbage (to that server
version) the server ought to reply with an error and disconnect, rather than
assume the bytestream will magically start making sense if it keeps reading more
bytes.

The problem with call_with_body_bytes in V2 was that the server could read the
start of a request, understand it enough to say "sorry, I can see you tried
calling Frobnicate but I don't support that", and then not realise that the next
bytes on the wire were a body for the first request rather than a new request.
Here there's no risk that I can see, so long as:

 a) all future versions do as we've documented and send a newline-terminated
    protocol version marker at the start of every message
 b) no future protocol version marker happens to also be a valid v1 request! :)

It is precisely for this reason (so that it is possible and simple for a
recipient to reliably know if the message is encoded in a way the recipient can
understand) that we send the protocol markers.

> | An alternative solution I considered and rejected was adding a new 'helloV3'
> | verb (maybe called 'SupportedVersions') that could return a list of supported
> | protocol versions, which would be more future proof than the existing 'hello'
> | verb.  This would have involved some moderately complicated code to fallback to
> | 'hello' when 'helloV3' isn't there, plus adding yet another verb.  And we'd need
> | to be pretty careful about how we defined the verb to make sure it was *really*
> | future-proof, unlike our last effort.  And really we've wanted to stop needing
> | 'hello' for a long time; it's just a wasted round-trip when the client's default
> | protocol version is understood by the server.  So even though I started writing
> | this solution, after chatting with Martin I decided it wasn't worth it.
> |
>
> I would *like* to get rid of hello. I'm just worried that we are sending
> "random" bytes to a server and expecting it to respond in a deterministic way
> for any call that we are trying to send it. Because V3 always has a clear v3
> header, it might be enough.

Right, that's the point of the protocol version markers.  It's there so that a
client or server can immediately stop trying to interpret messages if the other
side sends “random” bytes. 

> - -
> - -    def _build_client_protocol(self):
> - -        version = self._medium.protocol_version()
> +        self._protocol_version = None
> +        if headers is None:
> +            self._headers = {'Software version': bzrlib.__version__}
> +        else:
> +            self._headers = dict(headers)
> +
>
> ^- I think this should probably be called "self._default_headers" to make it
> clear that this is the suggested headers to be sent, and not the absolute ones.

I don't understand.  These are *the* headers that will be used, so long as the
protocol encoding supports headers.  “_default_headers” sounds like headers that
might be ignored and others sent instead.

> +    def _call(self, encoder, method, args, body=None, readv_body=None):
> +        encoder.set_headers(self._headers)
> +        if body is not None:
> +            encoder.call_with_body_bytes((method, ) + args, body)
> +        elif readv_body is not None:
> +            encoder.call_with_body_readv_array((method, ) + args,
> +                    readv_body)
> +        else:
> +            encoder.call(method, *args)
>
> ^- A check that 'body is not None and readv_body is not None' would be nice.
> Just to help developers that accidentally set both.

I've added this to the first if block:

            if readv_body is not None:
                raise AssertionError(
                    "body and readv_body are mutually exclusive.")

> It also seems like something should be "returned" here, but I guess that is done
> in the "read_response" portion.

Right.  The encoding and sending of requests is separated from the reading and
decoding of responses.

I'll rename _call to _send_request, which might make that more obvious.

> +    def _call_and_read_response(self, method, args, body=None, readv_body=None,
> +            expect_response_body=True):
> +        if self._protocol_version is not None:
> +            encoder, response_handler = self._construct_protocol(
> +                self._protocol_version)
> +            self._call(encoder, method, args, body=body, readv_body=readv_body)
> +            return (response_handler.read_response_tuple(
> +                        expect_body=expect_response_body),
> +                    response_handler)
> +        else:
> +            for protocol_version in [3, 2]:
> +                encoder, response_handler = self._construct_protocol(
> +                    protocol_version)
> +                self._call(encoder, method, args, body=body,
> +                           readv_body=readv_body)
> +                try:
> +                    response_tuple = response_handler.read_response_tuple(
> +                        expect_body=expect_response_body)
> +                except errors.UnexpectedProtocolVersionMarker, err:
> +                    # TODO: We could recover from this without disconnecting if
> +                    # we recognise the protocol version.
> +                    self._medium.disconnect()
> +                    continue
> +                else:
> +                    self._protocol_version = protocol_version
> +                    return response_tuple, response_handler
> +            raise errors.SmartProtocolError(
> +                'Server is not a Bazaar server: ' + str(err))
> +
>
> ^- There seems to be a lot of duplicated code between the "_protocol_version is
> not None" if statement, and the following loop. Could you abstract this into a
> "_call_and_read_response(protocol_version = XXX) instead?
> I know it might mean trapping UnexpectedProtocolVersionMarker all the time, but
> you could do something like:
>
> except errors.UnexpectedProtocolVersionMarker:
> ~  if self._protocol_version is not None:
> ~    # This server already claimed to support this version
> ~    raise
>
> IMO it would be cleaner.

Hmm.  That means combining the "we know the version, so just send" logic with the
"we're trying to autodetect".  Something about that feels quite wrong to me.

As a compromise, I've pushed the _construct_protocol call into _send_request, by
replacing the encoder param of _send_request with protocol_version, and
returning the response_handler.  So I can keep the different handling of the
read_response_tuple call here, rather than spreading the version-detection logic
across two methods.  The method now looks like:

    def _call_and_read_response(self, method, args, body=None, readv_body=None,
            expect_response_body=True):
        if self._protocol_version is not None:
            response_handler = self._send_request(
                self._protocol_version, method, args, body=body,
                readv_body=readv_body)
            return (response_handler.read_response_tuple(
                        expect_body=expect_response_body),
                    response_handler)
        else:
            for protocol_version in [3, 2]:
                response_handler = self._send_request(
                    protocol_version, method, args, body=body,
                    readv_body=readv_body)
                try:
                    response_tuple = response_handler.read_response_tuple(
                        expect_body=expect_response_body)
                except errors.UnexpectedProtocolVersionMarker, err:
                    # TODO: We could recover from this without disconnecting if
                    # we recognise the protocol version.
                    self._medium.disconnect()
                    continue
                else:
                    self._protocol_version = protocol_version
                    return response_tuple, response_handler
            raise errors.SmartProtocolError(
                'Server is not a Bazaar server: ' + str(err))

I could also easily remove the duplication of the return statements by using an
else block on the for statement.  But the cure would be worse than the disease
(I suspect a large fraction of our developers don't even know that for loops can
have else blocks...).

> === modified file 'bzrlib/smart/protocol.py'
> - --- bzrlib/smart/protocol.py	2008-04-23 22:43:14 +0000
> +++ bzrlib/smart/protocol.py	2008-05-07 14:06:29 +0000
> @@ -89,6 +89,9 @@
> ~         """
> ~         raise NotImplementedError(self.call_with_body_readv_array)
>
> +    def set_headers(self, headers):
> +        raise NotImplementedError(self.set_headers)
> +
>
> ^- Following the above, this might be "set_default_headers". Unless it really is
> the full set of headers.
>
> Non-default headers seem like they belong on the call_with...() itself, though,
> rather than being set on the Protocol object.

Well, the encoder is only used for a single request.  So there's no difference,
except that this way I don't need to change the signature of as many methods :)

> - -        self._number_needed_bytes = 4
> - -        self.state_accept = self._state_accept_expecting_headers
> +        if expect_version_marker:
> +            self.state_accept = self._state_accept_expecting_protocol_version
> +            # We're expecting at least the protocol version marker + some
> +            # headers.
> +            self._number_needed_bytes = len(MESSAGE_VERSION_THREE) + 4
> +        else:
> +            self.state_accept = self._state_accept_expecting_headers
> +            self._number_needed_bytes = 4
>
> ^- This seems like it should have some amount of knowledge whether
> MESSAGE_VERSION_THREE is actually the returned message. Or maybe that just needs
> to happen when MESSAGE_VERSION_FOUR comes out.

I'm not sure what you mean by this.  This flag is needed on the decoder because
the basic decoder is the same for requests and responses, but:

 - on the server side, the protocol marker is already consumed because the
   server needs to read it first to know which decoder to use.
 - on the client side, the protocol marker is not consumed, because the client
   logic assumes the response must be in the same version as the request.

So the logic is almost the same, except for the initial state, hence the flag.

Potentially we could do as the servers do, and have the generic client code look
at the protocol marker and use that to pick the right decoder, rather than
assuming that a v3 request *must* have a v3 response.  The existing code for v2
and v1 isn't as amenable to this as the v3 code, unfortunately, which is why I
haven't done that already.

Is that what you were referring to?

>
> +        needed_bytes = len(MESSAGE_VERSION_THREE) - len(self._in_buffer)
> +        if needed_bytes > 0:
> +            if not MESSAGE_VERSION_THREE.startswith(self._in_buffer):
> +                # We have enough bytes to know the protocol version is wrong
> +                raise errors.UnexpectedProtocolVersionMarker(self._in_buffer)
> +            raise _NeedMoreBytes(len(MESSAGE_VERSION_THREE))
>
> ^- This was a bit confusing when I first read it. certainly the startswith() is
> correct, even if it looks the wrong way around. It seems correct, but I spent 2
> minutes trying to rewrite it to be "correct" before I realized it already was.
> Maybe it is just me, or maybe it is confusing.

Yeah, that is a little bit subtle.  I've added a comment:

        ...
        if needed_bytes > 0:
            # We don't have enough bytes to check if the protocol version
            # marker is right.  But we can check if it is already wrong by
            # checking that the start of MESSAGE_VERSION_THREE matches what
            # we've read so far.
            # [In fact, if the remote end isn't bzr we might never receive
            # len(MESSAGE_VERSION_THREE) bytes.  So if the bytes we have so far
            # are wrong then we should just raise immediately rather than
            # stall.]
            if not MESSAGE_VERSION_THREE.startswith(self._in_buffer):
                ...

> ~     def _write_structure(self, args):
> @@ -969,6 +993,7 @@
> ~     def __init__(self, write_func):
> ~         _ProtocolThreeEncoder.__init__(self, write_func)
> ~         self.response_sent = False
> +        self._headers = {'Software version': bzrlib.__version__}
>
> ^- Again, _default_headers ?

Like before, this object is used for a single response.  That variable contains
the headers that *will* be sent on a response, not merely a default.

(I'm sure we'll eventually want to provide some way for servers to send
different headers than that, that paramterisation just hasn't been written yet
because nothing has needed it yet...)

> +    def set_headers(self, headers):
> +        self._headers = dict(headers)
> +
>
> ^- If we require headers to be a dict, I feel like "headers.copy()" is clearer
> about what you are doing. (Creating a copy so that mutations don't change your
> internal header list.)

Using dict(foo) to copy a “foo” that is a dict is idiomatic python.  (As is
using “list(some_list)”).  To my brain, “headers.copy()” is no clearer than
“dict(headers)”.  But of course, not everyone has my brain and its quirks.
Changed :)

> If you are trying to support .set_headers([('foo', 'bar'), ('baz', 'bing')])
> then what you did is fine.

No, I don't particularly want to support that.  (I don't particularly want to
disallow it, either.)

> I'm a little curious why you need to be setting "expect_version_marker=True"
> everywhere. Isn't this always True for V3 requests? Is it ever true/not true for
> V2 requests? Otherwise it seems like it could be set by the protocol, rather
> than having to be passed around.
> Specifically, if it is a V3 response decoder, then shouldn't it always
> expect_version_marker=True?

It's not set everywhere.  Just in places that construct a V3 decoder for
responses.  It's not set for V3 request decoders, and a different codepath is
taken by V1 and V2 response decoders.

(Because for V1 and V2 there's a single "SmartClientRequestProtocol*" object
that encodes requests, decodes responses, etc.  In V3 these different
responsibilities are neatly separated.  It sucks a bit that there's a different
codepath.  Unfortunately detangling the request and response logic in V1 and V2
is somewhat involved, and the patches for V3 are already plenty big enough!)

> MockMedium seems fine. I sort of wondered if we could get away from bytes on the
> wire, but I realized that we really do (at least at certain points) want to test
> bytes on the wire for these things.
> I don't think it is necessarily the best way to test falling back to V2 from a
> V3 server, but it is ok.
>
> Certainly I like to see bytes-on-the-wire tests because they help ensure
> compatibility across releases.

Right.  I'm certainly open to suggestions for better tests!  But this is the
best I've thought of for these tests so far, and I'm pretty happy with it.  For
these tests bytes-on-the-wire does seem most appropriate to me.

> I did notice this:
> +        medium.expect_request(
> +            'bzr message 3 (bzr 1.3)\n\x00\x00\x00\x02de' +
> +            's\x00\x00\x00\x1el11:method-name5:arg 15:arg 2ee',
> +            'bzr response 2\nfailed\n\n')
>
> ^- And I would expect that to be at least (bzr 1.5) for it to be merged. But
> that is probably a different patch.

Yeah, I will fix that.  But I'm going to do that last thing before submitting,
just in case I'm wrong about which version to change it to!  I'm pretty sure
it'll be 1.5, but I've been wrong before (obviously!).

> It just has to happen first, so that we don't end up with a (bzr 1.3) in a
> bzr.dev release that we have to worry about.
>
> BB:tweak

Thanks!

-Andrew.