Rev 3108: (vila) Fix #175886 by reading superfluous bytes sent by dumb http servers by chunk in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Thu Dec 13 19:14:44 GMT 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3108
revision-id:pqm at pqm.ubuntu.com-20071213191437-c42q84tx9pf0e2tk
parent: pqm at pqm.ubuntu.com-20071213141047-tklbta8rymzfpj6y
parent: v.ladeuil+lp at free.fr-20071213175256-1tw7ao1l9555n46o
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2007-12-13 19:14:37 +0000
message:
  (vila) Fix #175886 by reading superfluous bytes sent by dumb http servers by chunk
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/tests/test_http_response.py test_http_response.py-20060628233143-950b2a482a32505d
  bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
  bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3107.1.1
    revision-id:v.ladeuil+lp at free.fr-20071213175256-1tw7ao1l9555n46o
    parent: pqm at pqm.ubuntu.com-20071213141047-tklbta8rymzfpj6y
    parent: v.ladeuil+lp at free.fr-20071213154444-sd76q4uul3qn5ywh
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: trunk
    timestamp: Thu 2007-12-13 18:52:56 +0100
    message:
      Fix #175886 by reading superfluous bytes sent by dumb http servers by chunk
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/tests/test_http_response.py test_http_response.py-20060628233143-950b2a482a32505d
      bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3104.3.5
    revision-id:v.ladeuil+lp at free.fr-20071213154444-sd76q4uul3qn5ywh
    parent: v.ladeuil+lp at free.fr-20071213154042-77h01zei40qptsuv
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 175886
    timestamp: Thu 2007-12-13 16:44:44 +0100
    message:
      Fix typo.
    modified:
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3104.3.4
    revision-id:v.ladeuil+lp at free.fr-20071213154042-77h01zei40qptsuv
    parent: v.ladeuil+lp at free.fr-20071212210605-utjudvizj5v9qotc
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 175886
    timestamp: Thu 2007-12-13 16:40:42 +0100
    message:
      Add test.
      
      * bzrlib/transport/http/_urllib2_wrappers.py :
      Some cleanup.
      
      * bzrlib/tests/test_http_response.py:
      (TestHTTPConnection): Test the warning emission.
    modified:
      bzrlib/tests/test_http_response.py test_http_response.py-20060628233143-950b2a482a32505d
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3104.3.3
    revision-id:v.ladeuil+lp at free.fr-20071212210605-utjudvizj5v9qotc
    parent: v.ladeuil+lp at free.fr-20071212161401-uo6j6ohcwkbj1h4p
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 175886
    timestamp: Wed 2007-12-12 22:06:05 +0100
    message:
      Jam's and Aaron feedback about bug #175886.
      
      * bzrlib/transport/http/_urllib2_wrappers.py:
      (Response.finish): Returned the number of bytes discared.
      (AbstractHTTPConnection.cleanup_pipe): Warn the user about servers
      that do not recognize range requests.
    modified:
      bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3104.3.2
    revision-id:v.ladeuil+lp at free.fr-20071212161401-uo6j6ohcwkbj1h4p
    parent: v.ladeuil+lp at free.fr-20071212154335-ac7fi089uufzbkfz
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 175886
    timestamp: Wed 2007-12-12 17:14:01 +0100
    message:
      Fix typo.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
    ------------------------------------------------------------
    revno: 3104.3.1
    revision-id:v.ladeuil+lp at free.fr-20071212154335-ac7fi089uufzbkfz
    parent: pqm at pqm.ubuntu.com-20071211175118-s94sizduj201hrs5
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 175886
    timestamp: Wed 2007-12-12 16:43:35 +0100
    message:
      Fix #175886 by reading remaining bytes by chunks.
      
      * bzrlib/transport/http/_urllib2_wrappers.py:
      (Response.finish): Don't read all remaining bytes in one read or
      some buffer may explode.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
=== modified file 'NEWS'
--- a/NEWS	2007-12-11 05:13:36 +0000
+++ b/NEWS	2007-12-12 16:14:01 +0000
@@ -25,6 +25,10 @@
      removed use characters not supported in the terminal encoding.
      (Aaron Bentley)
 
+   * When dumb http servers return whole files instead of the requested ranges,
+     read the remaining bytes by chunks to avoid overflowing network buffers.
+     (Vincent Ladeuil, #175886)
+
   INTERNALS:
 
   API BREAKS:

=== modified file 'bzrlib/tests/test_http_response.py'
--- a/bzrlib/tests/test_http_response.py	2007-12-10 10:41:24 +0000
+++ b/bzrlib/tests/test_http_response.py	2007-12-13 15:40:42 +0000
@@ -44,7 +44,56 @@
     errors,
     tests,
     )
-from bzrlib.transport.http import response
+from bzrlib.transport.http import (
+    response,
+    _urllib2_wrappers,
+    )
+
+
+class ReadSocket(object):
+    """A socket-like object that can be given a predefined content."""
+
+    def __init__(self, data):
+        self.readfile = StringIO(data)
+
+    def makefile(self, mode='r', bufsize=None):
+        return self.readfile
+
+class FakeHTTPConnection(_urllib2_wrappers.HTTPConnection):
+
+    def __init__(self, sock):
+        _urllib2_wrappers.HTTPConnection.__init__(self, 'localhost')
+        # Set the socket to bypass the connection
+        self.sock = sock
+
+    def send(self, str):
+        """Ignores the writes on the socket."""
+        pass
+
+
+class TestHTTPConnection(tests.TestCase):
+
+    def test_cleanup_pipe(self):
+        sock = ReadSocket("""HTTP/1.1 200 OK\r
+Content-Type: text/plain; charset=UTF-8\r
+Content-Length: 18
+\r
+0123456789
+garbage""")
+        conn = FakeHTTPConnection(sock)
+        # Simulate the request sending so that the connection will be able to
+        # read the response.
+        conn.putrequest('GET', 'http://localhost/fictious')
+        conn.endheaders()
+        # Now, get the response
+        resp = conn.getresponse()
+        # Read part of the response
+        self.assertEquals('0123456789\n', resp.read(11))
+        # Override the thresold to force the warning emission
+        conn._range_warning_thresold = 6 # There are 7 bytes pending
+        conn.cleanup_pipe()
+        self.assertContainsRe(self._get_log(keep_log_file=True),
+                              'Got a 200 response when asking')
 
 
 class TestRangeFileMixin(object):

=== modified file 'bzrlib/transport/http/__init__.py'
--- a/bzrlib/transport/http/__init__.py	2007-12-10 10:41:24 +0000
+++ b/bzrlib/transport/http/__init__.py	2007-12-12 21:06:05 +0000
@@ -139,7 +139,6 @@
         # time of this writing it's even the only known client -- vila20071203
         return StringIO(response_file.read())
 
-    # TODO: Add tests for tail_amount or deprecate it
     def _get(self, relpath, ranges, tail_amount=0):
         """Get a file, or part of a file.
 

=== modified file 'bzrlib/transport/http/_urllib2_wrappers.py'
--- a/bzrlib/transport/http/_urllib2_wrappers.py	2007-12-06 22:46:16 +0000
+++ b/bzrlib/transport/http/_urllib2_wrappers.py	2007-12-13 15:44:44 +0000
@@ -78,9 +78,6 @@
     # Some responses have bodies in which we have no interest
     _body_ignored_responses = [301,302, 303, 307, 401, 403, 404]
 
-    def __init__(self, *args, **kwargs):
-        httplib.HTTPResponse.__init__(self, *args, **kwargs)
-
     def begin(self):
         """Begin to read the response from the server.
 
@@ -123,6 +120,12 @@
             # below we keep the socket with the server opened.
             self.will_close = False
 
+    # in finish() below, we may have to discard several MB in the worst
+    # case. To avoid buffering that much, we read and discard by chunks
+    # instead. The underlying file is either a socket or a StringIO, so reading
+    # 8k chunks should be fine.
+    _discarded_buf_size = 8192
+
     def finish(self):
         """Finish reading the body.
 
@@ -131,11 +134,26 @@
         persistent connection. If we don't use a persistent connection, well,
         nothing will block the next request since a new connection will be
         issued anyway.
+
+        :return: the number of bytes left on the socket (may be None)
         """
+        pending = None
         if not self.isclosed():
             # Make sure nothing was left to be read on the socket
-            data = self.read(self.length)
+            pending = 0
+            while self.length and self.length > self._discarded_buf_size:
+                data = self.read(self._discarded_buf_size)
+                pending += len(data)
+            if self.length:
+                data = self.read(self.length)
+                pending += len(data)
+            if pending:
+                trace.mutter(
+                    "bogus http server didn't give body length,"
+                    "%s bytes left on the socket",
+                    pending)
             self.close()
+        return pending
 
 
 # Not inheriting from 'object' because httplib.HTTPConnection doesn't.
@@ -145,13 +163,16 @@
     response_class = Response
     strict = 1 # We don't support HTTP/0.9
 
+    # When we detect a server responding with the whole file to range requests,
+    # we want to warn. But not below a given thresold.
+    _range_warning_thresold = 1024 * 1024
+
     def __init__(self):
         self._response = None
+        self._ranges_received_whole_file = None
 
     def _mutter_connect(self):
-        netloc = self.host
-        if self.port is not None:
-            netloc += ':%d' % self.port
+        netloc = '%s:%s' % (self.host, self.port)
         if self.proxied_host is not None:
             netloc += '(proxy for %s)' % self.proxied_host
         trace.mutter('* About to connect() to %s' % netloc)
@@ -162,12 +183,19 @@
         return self._response
 
     def cleanup_pipe(self):
-        """Make the connection believes the response have been fully handled.
-
-        That makes the httplib.HTTPConnection happy
-        """
+        """Make the connection believe the response has been fully processed."""
         if self._response is not None:
-            self._response.finish()
+            pending = self._response.finish()
+            # Warn the user (once)
+            if (self._ranges_received_whole_file is None
+                and self._response.status == 200
+                and pending and pending > self._range_warning_thresold
+                ):
+                self._ranges_received_whole_file = True
+                trace.warning(
+                    'Got a 200 response when asking for multiple ranges,'
+                    ' does your server at %s:%s support range requests?',
+                    self.host, self.port)
             self._response = None
         # Preserve our preciousss
         sock = self.sock




More information about the bazaar-commits mailing list