[MERGE] Fix _get_line logic in smart protocol

Thu Apr 10 08:23:58 BST 2008

Martin Pool wrote:
> Martin Pool has voted tweak.
> Status is now: Conditionally approved
> Comment:
> Good catch.
>
> === modified file 'bzrlib/smart/medium.py'
> --- bzrlib/smart/medium.py      2008-02-06 03:52:25 +0000
> +++ bzrlib/smart/medium.py      2008-02-27 07:02:52 +0000
> @@ -62,6 +62,14 @@
>          # backing_transport could be passed to serve instead of  
> __init__
>          self.backing_transport = backing_transport
>          self.finished = False
> +        self.push_back = None
> +
> +    def _push_back(self, bytes):
> +        assert self.push_back is None, (
> +            "_push_back called when self.push is %r" %  
> (self.push_back,))
> +        if bytes == '':
> +            return
> +        self.push_back = bytes
>
> Should be self.push, not self.push_back in the assertion message.

Fixed.

>
> The variable should probably be underscored?  Asking that reminds me  
> that
> it's poor to have self.a and self._a with different meanings.

Yeah, that is a problem.  I've renamed the push_back attribute to
_push_back_buffer.

> Please add a comment in the constructor about what it holds, and a
> docstring on this method.

I've added a comment to the class docstring rather than the constructor, because
the constructor doesn't accept any relevant parameters.  Here's what I've added
to the class docstring:

    :ivar _push_back_buffer: a str of bytes that have been read from the stream
        but not used yet, or None if there are no buffered bytes.  Subclasses
        should make sure to exhaust this buffer before reading more bytes from
        the stream.  See also the _push_back method.

And this is the docstring I've added to the method:

        """Return unused bytes to the medium, because they belong to the next
        request(s).

        This sets the _push_back_buffer to the given bytes.
        """

> @@ -163,11 +173,18 @@
>                      return
>                  protocol.accept_bytes(bytes)
>
> -        self.push_back = protocol.excess_buffer
> +        self._push_back(protocol.excess_buffer)
>
>      def _get_bytes(self, desired_count):
>          # We ignore the desired_count because on sockets it's more  
> efficient to
>          # read 4k at a time.
> +        if self.push_back is not None:
> +            assert self.push_back != '', (
> +                'self.push_back should never be the empty string, which  
> can be '
> +                'confused with EOF')
>
> In general I would recommend saying
>
>    '%s.thing should blah blah' % (self, )
>
> as it gives a bit more of a clue what went wrong, at least pointing you  
> to
> the right class and sometimes more than that.  For these objects we
> probably will generally only have one so it's not a big deal.

Done anyway.  It's a good habit, thanks for the suggestion.

>
> +            bytes = self.push_back
> +            self.push_back = None
> +            return bytes
>          return self.socket.recv(4096)
>
>      def terminate_due_to_error(self):
> @@ -217,6 +234,13 @@
>              protocol.accept_bytes(bytes)
>
>      def _get_bytes(self, desired_count):
> +        if self.push_back is not None:
> +            assert self.push_back != '', (
> +                'self.push_back should never be the empty string, which  
> can be '
> +                'confused with EOF')
> +            bytes = self.push_back
> +            self.push_back = None
> +            return bytes
>          return self._in.read(desired_count)
>
> There is a little duplicated code; maybe you should do
>
>     if self._push_back_buffer is not None:
>          return self._get_bytes_from_push_back(desired_count)
>
> and maybe that function should return just the desired_count, since  
> that's
> cheap to do?  On the other hand some callers may be happy to get more,
> like read_line.

I've added a _get_push_back_buffer function.  I've left it just unconditionally
returning the whole buffer; it's ok to return more bytes if we happen to already
have them, and it's simpler this way.

I also removed a bit of duplication from
SmartServerSocketStreamMedium._serve_one_request_unguarded: now that _get_bytes
uniformly makes sure the push_back buffer is used when appropriate, there's no
reason to duplicate the push_back handling in _serve_one_request_unguarded.

> === modified file 'bzrlib/tests/test_smart_transport.py'
>
> +    def test_socket_stream_incomplete_request(self):
> +        """The medium should still construct the right protocol version  
> even if
> +        the initial read only reads part of the request.
> +
> +        Specifically, it should correctly read the protocol version  
> line even
> +        if the partial read doesn't end in a newline.  An older, naive
> +        implementation of _get_line in the server used to have a bug in  
> that
> +        case.
> +        """
> +        incomplete_request_bytes = protocol.REQUEST_VERSION_TWO + 'hel'
> +        rest_of_request_bytes = 'lo\n'
> +        expected_response = (
> +            protocol.RESPONSE_VERSION_TWO + 'success\nok\x012\n')
> +        server_sock, client_sock = self.portable_socket_pair()
> +        server = medium.SmartServerSocketStreamMedium(
> +            server_sock, None)
> +        client_sock.sendall(incomplete_request_bytes)
> +        server_protocol = server._build_protocol()
> +        client_sock.sendall(rest_of_request_bytes)
> +        server._serve_one_request(server_protocol)
>
> Without tracing through the code, I'm not totally sure that the two  
> writes
> will be seen by the server as two reads.  But I presume you ran this
> before putting in the fix and so you know it works.

Yeah, I did.

Thanks for the review, and sorry about the delay in responding.  I'm sending
this to PQM now.

-Andrew.