[MERGE] Return (and expect) '3' rather than '2' from the 'hello' verb (protocol v3 patch 4/7)

Fri May 9 00:22:41 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
| John Arbash Meinel wrote:
|> Andrew Bennetts wrote:
| [...]
|> | It implements detection the server protocol version a better way: the first
time
|> | a request is made, it simply tries v3, then it tries v2, then it gives up.
|> | After that, it remembers the version it detected.  'hello' is no longer used by
|> | this client.
|> |
|>
|> My concern with this is the problem we had with the "call_with_body_bytes" for
|> unknown protocols. Where calling the server with an unknown method made it start
|> doing bad things. We know disconnect and reconnect, but that is still somewhat
|> unsafe. Considering the server still things you made multiple requests, and is
|> still failing, we are just hiding that from the client.
|>
|> I guess V3 is a much bigger change from V2, or even the V2 extra-body stuff. So
|> if V3 fails, it will fail right away, in a predictable manner for any call that
|> we might make. If it doesn't, then I think we need to reconsider how we want to
|> do it.
|
| It's not the same problem.  Since V2, we always send a newline-terminated
| protocol version marker before a request or response.  So the first bytes the
| recipient in either direction will read will immediately identify the message as
| a known version, or as unintelligible garbage.  If it's garbage (to that server
| version) the server ought to reply with an error and disconnect, rather than
| assume the bytestream will magically start making sense if it keeps reading more
| bytes.
|
| The problem with call_with_body_bytes in V2 was that the server could read the
| start of a request, understand it enough to say "sorry, I can see you tried
| calling Frobnicate but I don't support that", and then not realise that the next
| bytes on the wire were a body for the first request rather than a new request.
| Here there's no risk that I can see, so long as:
|
|  a) all future versions do as we've documented and send a newline-terminated
|     protocol version marker at the start of every message
|  b) no future protocol version marker happens to also be a valid v1 request! :)
|
| It is precisely for this reason (so that it is possible and simple for a
| recipient to reliably know if the message is encoded in a way the recipient can
| understand) that we send the protocol markers.

Well, there is a further statement about how it will handle when a request is
unknown. I think the answer is just disconnect.
*If* it starts with "MESSAGE_V3" and then the actual request is unknown, I think
the server can safely skip it. We have taken the precaution of defining the
bytes-on-the-wire with length prefixes, etc, so that the server should now how
much it needs to consume before it can be safe to read more. (I'm curious about
security implications here. mostly of the DoS/overload memory/spin on 50 empty
sockets, waiting for MESSAGE_V???, or that last character)

Anyway, I think it should be clearly documented how the server needs to handle
unknown requests, or we *can* get into the same situation again. I *think*
you've done this for V2 and V3, though I can't say I explicitly read a
".disconnect() line". I guess maybe you raise an exception?

|
|> | An alternative solution I considered and rejected was adding a new 'helloV3'
|> | verb (maybe called 'SupportedVersions') that could return a list of supported
|> | protocol versions, which would be more future proof than the existing 'hello'
|> | verb.  This would have involved some moderately complicated code to fallback to
|> | 'hello' when 'helloV3' isn't there, plus adding yet another verb.  And we'd
need
|> | to be pretty careful about how we defined the verb to make sure it was *really*
|> | future-proof, unlike our last effort.  And really we've wanted to stop needing
|> | 'hello' for a long time; it's just a wasted round-trip when the client's
default
|> | protocol version is understood by the server.  So even though I started writing
|> | this solution, after chatting with Martin I decided it wasn't worth it.
|> |
|>
|> I would *like* to get rid of hello. I'm just worried that we are sending
|> "random" bytes to a server and expecting it to respond in a deterministic way
|> for any call that we are trying to send it. Because V3 always has a clear v3
|> header, it might be enough.
|
| Right, that's the point of the protocol version markers.  It's there so that a
| client or server can immediately stop trying to interpret messages if the other
| side sends “random” bytes.
|
|> - -
|> - -    def _build_client_protocol(self):
|> - -        version = self._medium.protocol_version()
|> +        self._protocol_version = None
|> +        if headers is None:
|> +            self._headers = {'Software version': bzrlib.__version__}
|> +        else:
|> +            self._headers = dict(headers)
|> +
|>
|> ^- I think this should probably be called "self._default_headers" to make it
|> clear that this is the suggested headers to be sent, and not the absolute ones.
|
| I don't understand.  These are *the* headers that will be used, so long as the
| protocol encoding supports headers.  “_default_headers” sounds like headers that
| might be ignored and others sent instead.

I guess I didn't realize this object was only used 1 time. It certainly seems
like a fairly heavy object to be constructing over and over again.

|
|> +    def _call(self, encoder, method, args, body=None, readv_body=None):
|> +        encoder.set_headers(self._headers)
|> +        if body is not None:
|> +            encoder.call_with_body_bytes((method, ) + args, body)
|> +        elif readv_body is not None:
|> +            encoder.call_with_body_readv_array((method, ) + args,
|> +                    readv_body)
|> +        else:
|> +            encoder.call(method, *args)
|>
|> ^- A check that 'body is not None and readv_body is not None' would be nice.
|> Just to help developers that accidentally set both.
|
| I've added this to the first if block:
|
|             if readv_body is not None:
|                 raise AssertionError(
|                     "body and readv_body are mutually exclusive.")
|
|> It also seems like something should be "returned" here, but I guess that is done
|> in the "read_response" portion.
|
| Right.  The encoding and sending of requests is separated from the reading and
| decoding of responses.
|
| I'll rename _call to _send_request, which might make that more obvious.
|
|> +    def _call_and_read_response(self, method, args, body=None, readv_body=None,
|> +            expect_response_body=True):
|> +        if self._protocol_version is not None:
|> +            encoder, response_handler = self._construct_protocol(
|> +                self._protocol_version)
|> +            self._call(encoder, method, args, body=body, readv_body=readv_body)
|> +            return (response_handler.read_response_tuple(
|> +                        expect_body=expect_response_body),
|> +                    response_handler)
|> +        else:
|> +            for protocol_version in [3, 2]:
|> +                encoder, response_handler = self._construct_protocol(
|> +                    protocol_version)
|> +                self._call(encoder, method, args, body=body,
|> +                           readv_body=readv_body)
|> +                try:
|> +                    response_tuple = response_handler.read_response_tuple(
|> +                        expect_body=expect_response_body)
|> +                except errors.UnexpectedProtocolVersionMarker, err:
|> +                    # TODO: We could recover from this without disconnecting if
|> +                    # we recognise the protocol version.
|> +                    self._medium.disconnect()
|> +                    continue
|> +                else:
|> +                    self._protocol_version = protocol_version
|> +                    return response_tuple, response_handler
|> +            raise errors.SmartProtocolError(
|> +                'Server is not a Bazaar server: ' + str(err))
|> +
|>
|> ^- There seems to be a lot of duplicated code between the "_protocol_version is
|> not None" if statement, and the following loop. Could you abstract this into a
|> "_call_and_read_response(protocol_version = XXX) instead?
|> I know it might mean trapping UnexpectedProtocolVersionMarker all the time, but
|> you could do something like:
|>
|> except errors.UnexpectedProtocolVersionMarker:
|> ~  if self._protocol_version is not None:
|> ~    # This server already claimed to support this version
|> ~    raise
|>
|> IMO it would be cleaner.
|
| Hmm.  That means combining the "we know the version, so just send" logic with the
| "we're trying to autodetect".  Something about that feels quite wrong to me.

Well, there was always my:

if self._protocol_version is not None:
~  # we've finished autodetecting, abort because the server must have lied:
~  raise

Which should still cleanly handle the "I'm autodetecting now" from the "I'm just
processing messages". And not have any code duplication.

I'll leave it to you, though. You only have 24hrs to get it in. And you still
need the follow up patches.

I'm wondering about this destabilizing 1.5rc1 at this point. We normally would
want major changes to land right after a freeze, not right before. How do you
feel about it? We didn't make an official "bugfix freeze" of bzr.dev this time,
but I'm thinking this is a big patch to bring in the day before an rc1.

|
| As a compromise, I've pushed the _construct_protocol call into _send_request, by
| replacing the encoder param of _send_request with protocol_version, and
| returning the response_handler.  So I can keep the different handling of the
| read_response_tuple call here, rather than spreading the version-detection logic
| across two methods.  The method now looks like:
|
|     def _call_and_read_response(self, method, args, body=None, readv_body=None,
|             expect_response_body=True):
|         if self._protocol_version is not None:
|             response_handler = self._send_request(
|                 self._protocol_version, method, args, body=body,
|                 readv_body=readv_body)
|             return (response_handler.read_response_tuple(
|                         expect_body=expect_response_body),
|                     response_handler)
|         else:
|             for protocol_version in [3, 2]:
|                 response_handler = self._send_request(
|                     protocol_version, method, args, body=body,
|                     readv_body=readv_body)
|                 try:
|                     response_tuple = response_handler.read_response_tuple(
|                         expect_body=expect_response_body)
|                 except errors.UnexpectedProtocolVersionMarker, err:
|                     # TODO: We could recover from this without disconnecting if
|                     # we recognise the protocol version.
|                     self._medium.disconnect()
|                     continue
|                 else:
|                     self._protocol_version = protocol_version
|                     return response_tuple, response_handler
|             raise errors.SmartProtocolError(
|                 'Server is not a Bazaar server: ' + str(err))
|
| I could also easily remove the duplication of the return statements by using an
| else block on the for statement.  But the cure would be worse than the disease
| (I suspect a large fraction of our developers don't even know that for loops can
| have else blocks...).
|

I use them fairly often. Not sure if I use them in bzr.

...
|> - -        self._number_needed_bytes = 4
|> - -        self.state_accept = self._state_accept_expecting_headers
|> +        if expect_version_marker:
|> +            self.state_accept = self._state_accept_expecting_protocol_version
|> +            # We're expecting at least the protocol version marker + some
|> +            # headers.
|> +            self._number_needed_bytes = len(MESSAGE_VERSION_THREE) + 4
|> +        else:
|> +            self.state_accept = self._state_accept_expecting_headers
|> +            self._number_needed_bytes = 4
|>
|> ^- This seems like it should have some amount of knowledge whether
|> MESSAGE_VERSION_THREE is actually the returned message. Or maybe that just needs
|> to happen when MESSAGE_VERSION_FOUR comes out.
|
| I'm not sure what you mean by this.  This flag is needed on the decoder because
| the basic decoder is the same for requests and responses, but:
|
|  - on the server side, the protocol marker is already consumed because the
|    server needs to read it first to know which decoder to use.
|  - on the client side, the protocol marker is not consumed, because the client
|    logic assumes the response must be in the same version as the request.
|
| So the logic is almost the same, except for the initial state, hence the flag.
|

Ok. There are 2 parts here.

1) Client versus server, the flag needs to be present not present

2) When the flag is present, you assume that you are expecting
MESSAGE_VERSION_THREE. I might have misunderstood where this code was. I thought
it was generic between V2 and V3 (and a theoretical V4). It may be that this is
generic only between V3 request and respond.

It sounds like it is my confusion, not yours. But I wanted to make my point
clear, in case I was right.

| Potentially we could do as the servers do, and have the generic client code look
| at the protocol marker and use that to pick the right decoder, rather than
| assuming that a v3 request *must* have a v3 response.  The existing code for v2
| and v1 isn't as amenable to this as the v3 code, unfortunately, which is why I
| haven't done that already.
|
| Is that what you were referring to?
|
|> +        needed_bytes = len(MESSAGE_VERSION_THREE) - len(self._in_buffer)
|> +        if needed_bytes > 0:
|> +            if not MESSAGE_VERSION_THREE.startswith(self._in_buffer):
|> +                # We have enough bytes to know the protocol version is wrong
|> +                raise errors.UnexpectedProtocolVersionMarker(self._in_buffer)
|> +            raise _NeedMoreBytes(len(MESSAGE_VERSION_THREE))
|>
|> ^- This was a bit confusing when I first read it. certainly the startswith() is
|> correct, even if it looks the wrong way around. It seems correct, but I spent 2
|> minutes trying to rewrite it to be "correct" before I realized it already was.
|> Maybe it is just me, or maybe it is confusing.
|
| Yeah, that is a little bit subtle.  I've added a comment:
|
|         ...
|         if needed_bytes > 0:
|             # We don't have enough bytes to check if the protocol version
|             # marker is right.  But we can check if it is already wrong by
|             # checking that the start of MESSAGE_VERSION_THREE matches what
|             # we've read so far.
|             # [In fact, if the remote end isn't bzr we might never receive
|             # len(MESSAGE_VERSION_THREE) bytes.  So if the bytes we have so far
|             # are wrong then we should just raise immediately rather than
|             # stall.]
|             if not MESSAGE_VERSION_THREE.startswith(self._in_buffer):
|                 ...
|

Looks good.

...

|
| Using dict(foo) to copy a “foo” that is a dict is idiomatic python.  (As is
| using “list(some_list)”).  To my brain, “headers.copy()” is no clearer than
| “dict(headers)”.  But of course, not everyone has my brain and its quirks.
| Changed :)
|

I use both of these as a way to copy a list, and as a way to turn an iterator
into a dict/list. With generators being more common, I'm not sure if it is as
idiomatic anymore.

|> If you are trying to support .set_headers([('foo', 'bar'), ('baz', 'bing')])
|> then what you did is fine.
|
| No, I don't particularly want to support that.  (I don't particularly want to
| disallow it, either.)

IMO, tsort.py has this semi-bug. It claims to want a list of tuples, but
actually takes a dict just fine. So inside our code base, we have places that go:

tsort.merge_sort(my_dict.items())

and other places that just do:

tsort.merge_sort(my_dict)

...

|> MockMedium seems fine. I sort of wondered if we could get away from bytes on the
|> wire, but I realized that we really do (at least at certain points) want to test
|> bytes on the wire for these things.
|> I don't think it is necessarily the best way to test falling back to V2 from a
|> V3 server, but it is ok.
|>
|> Certainly I like to see bytes-on-the-wire tests because they help ensure
|> compatibility across releases.
|
| Right.  I'm certainly open to suggestions for better tests!  But this is the
| best I've thought of for these tests so far, and I'm pretty happy with it.  For
| these tests bytes-on-the-wire does seem most appropriate to me.

Well, for the auto-retry specifically, it seemed like testing the logical "try
v3 then fall back to try v2" would be a better test of the concept.
bytes-on-the-wire isn't bad, though, since this is a fairly critical stage. And
certainly, I don't expect there to be any need to change the on-the-wire bytes.
(We won't be evolving the V2 server anymore, etc.)

|
|> I did notice this:
|> +        medium.expect_request(
|> +            'bzr message 3 (bzr 1.3)\n\x00\x00\x00\x02de' +
|> +            's\x00\x00\x00\x1el11:method-name5:arg 15:arg 2ee',
|> +            'bzr response 2\nfailed\n\n')
|>
|> ^- And I would expect that to be at least (bzr 1.5) for it to be merged. But
|> that is probably a different patch.
|
| Yeah, I will fix that.  But I'm going to do that last thing before submitting,
| just in case I'm wrong about which version to change it to!  I'm pretty sure
| it'll be 1.5, but I've been wrong before (obviously!).

Yeah, if this was feature freeze week, I'd probably be more about merging it
right now. But seeing as 1.5rc1 is tomorrow, I'm a bit more hesitant.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgji0AACgkQJdeBCYSNAAPrggCfdmDm7Yr0Kt2fp3fbZ2C0iyAu
2JQAoNo/BLb8zzCDAf5/n3cy5lygkyTo
=Fh9Y
-----END PGP SIGNATURE-----