[MERGE] Return (and expect) '3' rather than '2' from the 'hello' verb (protocol v3 patch 4/7)

Fri May 9 04:56:19 BST 2008

It would be nice to land it, but it is very big, and I feel we will
probably find more bugs in real use. I found some issues in the first
part of the overall patch so there may be more.  i suppose it has no
user-visible benefit yet? So if John, Andrew or others want to defer
it that's ok with me.

On 5/9/08, John Arbash Meinel <john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Andrew Bennetts wrote:
> | John Arbash Meinel wrote:
> |> Andrew Bennetts wrote:
> | [...]
> |> | It implements detection the server protocol version a better way: the
> first
> time
> |> | a request is made, it simply tries v3, then it tries v2, then it gives
> up.
> |> | After that, it remembers the version it detected.  'hello' is no longer
> used by
> |> | this client.
> |> |
> |>
> |> My concern with this is the problem we had with the
> "call_with_body_bytes" for
> |> unknown protocols. Where calling the server with an unknown method made
> it start
> |> doing bad things. We know disconnect and reconnect, but that is still
> somewhat
> |> unsafe. Considering the server still things you made multiple requests,
> and is
> |> still failing, we are just hiding that from the client.
> |>
> |> I guess V3 is a much bigger change from V2, or even the V2 extra-body
> stuff. So
> |> if V3 fails, it will fail right away, in a predictable manner for any
> call that
> |> we might make. If it doesn't, then I think we need to reconsider how we
> want to
> |> do it.
> |
> | It's not the same problem.  Since V2, we always send a newline-terminated
> | protocol version marker before a request or response.  So the first bytes
> the
> | recipient in either direction will read will immediately identify the
> message as
> | a known version, or as unintelligible garbage.  If it's garbage (to that
> server
> | version) the server ought to reply with an error and disconnect, rather
> than
> | assume the bytestream will magically start making sense if it keeps
> reading more
> | bytes.
> |
> | The problem with call_with_body_bytes in V2 was that the server could read
> the
> | start of a request, understand it enough to say "sorry, I can see you
> tried
> | calling Frobnicate but I don't support that", and then not realise that
> the next
> | bytes on the wire were a body for the first request rather than a new
> request.
> | Here there's no risk that I can see, so long as:
> |
> |  a) all future versions do as we've documented and send a
> newline-terminated
> |     protocol version marker at the start of every message
> |  b) no future protocol version marker happens to also be a valid v1
> request! :)
> |
> | It is precisely for this reason (so that it is possible and simple for a
> | recipient to reliably know if the message is encoded in a way the
> recipient can
> | understand) that we send the protocol markers.
>
> Well, there is a further statement about how it will handle when a request
> is
> unknown. I think the answer is just disconnect.
> *If* it starts with "MESSAGE_V3" and then the actual request is unknown, I
> think
> the server can safely skip it. We have taken the precaution of defining the
> bytes-on-the-wire with length prefixes, etc, so that the server should now
> how
> much it needs to consume before it can be safe to read more. (I'm curious
> about
> security implications here. mostly of the DoS/overload memory/spin on 50
> empty
> sockets, waiting for MESSAGE_V???, or that last character)
>
> Anyway, I think it should be clearly documented how the server needs to
> handle
> unknown requests, or we *can* get into the same situation again. I *think*
> you've done this for V2 and V3, though I can't say I explicitly read a
> ".disconnect() line". I guess maybe you raise an exception?
>
>
> |
> |> | An alternative solution I considered and rejected was adding a new
> 'helloV3'
> |> | verb (maybe called 'SupportedVersions') that could return a list of
> supported
> |> | protocol versions, which would be more future proof than the existing
> 'hello'
> |> | verb.  This would have involved some moderately complicated code to
> fallback to
> |> | 'hello' when 'helloV3' isn't there, plus adding yet another verb.  And
> we'd
> need
> |> | to be pretty careful about how we defined the verb to make sure it was
> *really*
> |> | future-proof, unlike our last effort.  And really we've wanted to stop
> needing
> |> | 'hello' for a long time; it's just a wasted round-trip when the
> client's
> default
> |> | protocol version is understood by the server.  So even though I started
> writing
> |> | this solution, after chatting with Martin I decided it wasn't worth it.
> |> |
> |>
> |> I would *like* to get rid of hello. I'm just worried that we are sending
> |> "random" bytes to a server and expecting it to respond in a deterministic
> way
> |> for any call that we are trying to send it. Because V3 always has a clear
> v3
> |> header, it might be enough.
> |
> | Right, that's the point of the protocol version markers.  It's there so
> that a
> | client or server can immediately stop trying to interpret messages if the
> other
> | side sends "random" bytes.
> |
> |> - -
> |> - -    def _build_client_protocol(self):
> |> - -        version = self._medium.protocol_version()
> |> +        self._protocol_version = None
> |> +        if headers is None:
> |> +            self._headers = {'Software version': bzrlib.__version__}
> |> +        else:
> |> +            self._headers = dict(headers)
> |> +
> |>
> |> ^- I think this should probably be called "self._default_headers" to make
> it
> |> clear that this is the suggested headers to be sent, and not the absolute
> ones.
> |
> | I don't understand.  These are *the* headers that will be used, so long as
> the
> | protocol encoding supports headers.  "_default_headers" sounds like
> headers that
> | might be ignored and others sent instead.
>
> I guess I didn't realize this object was only used 1 time. It certainly
> seems
> like a fairly heavy object to be constructing over and over again.
>
> |
> |> +    def _call(self, encoder, method, args, body=None, readv_body=None):
> |> +        encoder.set_headers(self._headers)
> |> +        if body is not None:
> |> +            encoder.call_with_body_bytes((method, ) + args, body)
> |> +        elif readv_body is not None:
> |> +            encoder.call_with_body_readv_array((method, ) + args,
> |> +                    readv_body)
> |> +        else:
> |> +            encoder.call(method, *args)
> |>
> |> ^- A check that 'body is not None and readv_body is not None' would be
> nice.
> |> Just to help developers that accidentally set both.
> |
> | I've added this to the first if block:
> |
> |             if readv_body is not None:
> |                 raise AssertionError(
> |                     "body and readv_body are mutually exclusive.")
> |
> |> It also seems like something should be "returned" here, but I guess that
> is done
> |> in the "read_response" portion.
> |
> | Right.  The encoding and sending of requests is separated from the reading
> and
> | decoding of responses.
> |
> | I'll rename _call to _send_request, which might make that more obvious.
> |
> |> +    def _call_and_read_response(self, method, args, body=None,
> readv_body=None,
> |> +            expect_response_body=True):
> |> +        if self._protocol_version is not None:
> |> +            encoder, response_handler = self._construct_protocol(
> |> +                self._protocol_version)
> |> +            self._call(encoder, method, args, body=body,
> readv_body=readv_body)
> |> +            return (response_handler.read_response_tuple(
> |> +                        expect_body=expect_response_body),
> |> +                    response_handler)
> |> +        else:
> |> +            for protocol_version in [3, 2]:
> |> +                encoder, response_handler = self._construct_protocol(
> |> +                    protocol_version)
> |> +                self._call(encoder, method, args, body=body,
> |> +                           readv_body=readv_body)
> |> +                try:
> |> +                    response_tuple =
> response_handler.read_response_tuple(
> |> +                        expect_body=expect_response_body)
> |> +                except errors.UnexpectedProtocolVersionMarker, err:
> |> +                    # TODO: We could recover from this without
> disconnecting if
> |> +                    # we recognise the protocol version.
> |> +                    self._medium.disconnect()
> |> +                    continue
> |> +                else:
> |> +                    self._protocol_version = protocol_version
> |> +                    return response_tuple, response_handler
> |> +            raise errors.SmartProtocolError(
> |> +                'Server is not a Bazaar server: ' + str(err))
> |> +
> |>
> |> ^- There seems to be a lot of duplicated code between the
> "_protocol_version is
> |> not None" if statement, and the following loop. Could you abstract this
> into a
> |> "_call_and_read_response(protocol_version = XXX) instead?
> |> I know it might mean trapping UnexpectedProtocolVersionMarker all the
> time, but
> |> you could do something like:
> |>
> |> except errors.UnexpectedProtocolVersionMarker:
> |> ~  if self._protocol_version is not None:
> |> ~    # This server already claimed to support this version
> |> ~    raise
> |>
> |> IMO it would be cleaner.
> |
> | Hmm.  That means combining the "we know the version, so just send" logic
> with the
> | "we're trying to autodetect".  Something about that feels quite wrong to
> me.
>
> Well, there was always my:
>
> if self._protocol_version is not None:
> ~  # we've finished autodetecting, abort because the server must have lied:
> ~  raise
>
> Which should still cleanly handle the "I'm autodetecting now" from the "I'm
> just
> processing messages". And not have any code duplication.
>
> I'll leave it to you, though. You only have 24hrs to get it in. And you
> still
> need the follow up patches.
>
> I'm wondering about this destabilizing 1.5rc1 at this point. We normally
> would
> want major changes to land right after a freeze, not right before. How do
> you
> feel about it? We didn't make an official "bugfix freeze" of bzr.dev this
> time,
> but I'm thinking this is a big patch to bring in the day before an rc1.
>
> |
> | As a compromise, I've pushed the _construct_protocol call into
> _send_request, by
> | replacing the encoder param of _send_request with protocol_version, and
> | returning the response_handler.  So I can keep the different handling of
> the
> | read_response_tuple call here, rather than spreading the version-detection
> logic
> | across two methods.  The method now looks like:
> |
> |     def _call_and_read_response(self, method, args, body=None,
> readv_body=None,
> |             expect_response_body=True):
> |         if self._protocol_version is not None:
> |             response_handler = self._send_request(
> |                 self._protocol_version, method, args, body=body,
> |                 readv_body=readv_body)
> |             return (response_handler.read_response_tuple(
> |                         expect_body=expect_response_body),
> |                     response_handler)
> |         else:
> |             for protocol_version in [3, 2]:
> |                 response_handler = self._send_request(
> |                     protocol_version, method, args, body=body,
> |                     readv_body=readv_body)
> |                 try:
> |                     response_tuple = response_handler.read_response_tuple(
> |                         expect_body=expect_response_body)
> |                 except errors.UnexpectedProtocolVersionMarker, err:
> |                     # TODO: We could recover from this without
> disconnecting if
> |                     # we recognise the protocol version.
> |                     self._medium.disconnect()
> |                     continue
> |                 else:
> |                     self._protocol_version = protocol_version
> |                     return response_tuple, response_handler
> |             raise errors.SmartProtocolError(
> |                 'Server is not a Bazaar server: ' + str(err))
> |
> | I could also easily remove the duplication of the return statements by
> using an
> | else block on the for statement.  But the cure would be worse than the
> disease
> | (I suspect a large fraction of our developers don't even know that for
> loops can
> | have else blocks...).
> |
>
> I use them fairly often. Not sure if I use them in bzr.
>
>
> ...
> |> - -        self._number_needed_bytes = 4
> |> - -        self.state_accept = self._state_accept_expecting_headers
> |> +        if expect_version_marker:
> |> +            self.state_accept =
> self._state_accept_expecting_protocol_version
> |> +            # We're expecting at least the protocol version marker +
> some
> |> +            # headers.
> |> +            self._number_needed_bytes = len(MESSAGE_VERSION_THREE) + 4
> |> +        else:
> |> +            self.state_accept = self._state_accept_expecting_headers
> |> +            self._number_needed_bytes = 4
> |>
> |> ^- This seems like it should have some amount of knowledge whether
> |> MESSAGE_VERSION_THREE is actually the returned message. Or maybe that
> just needs
> |> to happen when MESSAGE_VERSION_FOUR comes out.
> |
> | I'm not sure what you mean by this.  This flag is needed on the decoder
> because
> | the basic decoder is the same for requests and responses, but:
> |
> |  - on the server side, the protocol marker is already consumed because the
> |    server needs to read it first to know which decoder to use.
> |  - on the client side, the protocol marker is not consumed, because the
> client
> |    logic assumes the response must be in the same version as the request.
> |
> | So the logic is almost the same, except for the initial state, hence the
> flag.
> |
>
> Ok. There are 2 parts here.
>
> 1) Client versus server, the flag needs to be present not present
>
> 2) When the flag is present, you assume that you are expecting
> MESSAGE_VERSION_THREE. I might have misunderstood where this code was. I
> thought
> it was generic between V2 and V3 (and a theoretical V4). It may be that this
> is
> generic only between V3 request and respond.
>
> It sounds like it is my confusion, not yours. But I wanted to make my point
> clear, in case I was right.
>
>
> | Potentially we could do as the servers do, and have the generic client
> code look
> | at the protocol marker and use that to pick the right decoder, rather than
> | assuming that a v3 request *must* have a v3 response.  The existing code
> for v2
> | and v1 isn't as amenable to this as the v3 code, unfortunately, which is
> why I
> | haven't done that already.
> |
> | Is that what you were referring to?
> |
> |> +        needed_bytes = len(MESSAGE_VERSION_THREE) - len(self._in_buffer)
> |> +        if needed_bytes > 0:
> |> +            if not MESSAGE_VERSION_THREE.startswith(self._in_buffer):
> |> +                # We have enough bytes to know the protocol version is
> wrong
> |> +                raise
> errors.UnexpectedProtocolVersionMarker(self._in_buffer)
> |> +            raise _NeedMoreBytes(len(MESSAGE_VERSION_THREE))
> |>
> |> ^- This was a bit confusing when I first read it. certainly the
> startswith() is
> |> correct, even if it looks the wrong way around. It seems correct, but I
> spent 2
> |> minutes trying to rewrite it to be "correct" before I realized it already
> was.
> |> Maybe it is just me, or maybe it is confusing.
> |
> | Yeah, that is a little bit subtle.  I've added a comment:
> |
> |         ...
> |         if needed_bytes > 0:
> |             # We don't have enough bytes to check if the protocol version
> |             # marker is right.  But we can check if it is already wrong by
> |             # checking that the start of MESSAGE_VERSION_THREE matches
> what
> |             # we've read so far.
> |             # [In fact, if the remote end isn't bzr we might never receive
> |             # len(MESSAGE_VERSION_THREE) bytes.  So if the bytes we have
> so far
> |             # are wrong then we should just raise immediately rather than
> |             # stall.]
> |             if not MESSAGE_VERSION_THREE.startswith(self._in_buffer):
> |                 ...
> |
>
> Looks good.
>
>
> ...
>
> |
> | Using dict(foo) to copy a "foo" that is a dict is idiomatic python.  (As
> is
> | using "list(some_list)").  To my brain, "headers.copy()" is no clearer
> than
> | "dict(headers)".  But of course, not everyone has my brain and its quirks.
> | Changed :)
> |
>
> I use both of these as a way to copy a list, and as a way to turn an
> iterator
> into a dict/list. With generators being more common, I'm not sure if it is
> as
> idiomatic anymore.
>
> |> If you are trying to support .set_headers([('foo', 'bar'), ('baz',
> 'bing')])
> |> then what you did is fine.
> |
> | No, I don't particularly want to support that.  (I don't particularly want
> to
> | disallow it, either.)
>
> IMO, tsort.py has this semi-bug. It claims to want a list of tuples, but
> actually takes a dict just fine. So inside our code base, we have places
> that go:
>
> tsort.merge_sort(my_dict.items())
>
> and other places that just do:
>
> tsort.merge_sort(my_dict)
>
>
> ...
>
> |> MockMedium seems fine. I sort of wondered if we could get away from bytes
> on the
> |> wire, but I realized that we really do (at least at certain points) want
> to test
> |> bytes on the wire for these things.
> |> I don't think it is necessarily the best way to test falling back to V2
> from a
> |> V3 server, but it is ok.
> |>
> |> Certainly I like to see bytes-on-the-wire tests because they help ensure
> |> compatibility across releases.
> |
> | Right.  I'm certainly open to suggestions for better tests!  But this is
> the
> | best I've thought of for these tests so far, and I'm pretty happy with it.
>  For
> | these tests bytes-on-the-wire does seem most appropriate to me.
>
> Well, for the auto-retry specifically, it seemed like testing the logical
> "try
> v3 then fall back to try v2" would be a better test of the concept.
> bytes-on-the-wire isn't bad, though, since this is a fairly critical stage.
> And
> certainly, I don't expect there to be any need to change the on-the-wire
> bytes.
> (We won't be evolving the V2 server anymore, etc.)
>
> |
> |> I did notice this:
> |> +        medium.expect_request(
> |> +            'bzr message 3 (bzr 1.3)\n\x00\x00\x00\x02de' +
> |> +            's\x00\x00\x00\x1el11:method-name5:arg 15:arg 2ee',
> |> +            'bzr response 2\nfailed\n\n')
> |>
> |> ^- And I would expect that to be at least (bzr 1.5) for it to be merged.
> But
> |> that is probably a different patch.
> |
> | Yeah, I will fix that.  But I'm going to do that last thing before
> submitting,
> | just in case I'm wrong about which version to change it to!  I'm pretty
> sure
> | it'll be 1.5, but I've been wrong before (obviously!).
>
> Yeah, if this was feature freeze week, I'd probably be more about merging it
> right now. But seeing as 1.5rc1 is tomorrow, I'm a bit more hesitant.
>
> |
> |> It just has to happen first, so that we don't end up with a (bzr 1.3) in
> a
> |> bzr.dev release that we have to worry about.
> |>
> |> BB:tweak
> |
> | Thanks!
> |
> | -Andrew.
> |
> |
>
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkgji0AACgkQJdeBCYSNAAPrggCfdmDm7Yr0Kt2fp3fbZ2C0iyAu
> 2JQAoNo/BLb8zzCDAf5/n3cy5lygkyTo
> =Fh9Y
> -----END PGP SIGNATURE-----
>
>

-- 
Martin <http://launchpad.net/~mbp/>