Rev 3060: Fix bug #172701 by catching the pycurl ShortReadvError and allowing readv to issue several get requests in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Fri Nov 30 18:21:11 GMT 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3060
revision-id:pqm at pqm.ubuntu.com-20071130182102-i0t564k01anm7uk2
parent: pqm at pqm.ubuntu.com-20071130083301-5zq7705t6xa7yikn
parent: v.ladeuil+lp at free.fr-20071130174841-6yitg1imn504sttr
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2007-11-30 18:21:02 +0000
message:
  Fix bug #172701 by catching the pycurl ShortReadvError and allowing readv to issue several get requests
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/debug.py                debug.py-20061102062349-vdhrw9qdpck8cl35-1
  bzrlib/help_topics.py          help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
  bzrlib/tests/HTTPTestUtil.py   HTTPTestUtil.py-20050914180604-247d3aafb7a43343
  bzrlib/tests/HttpServer.py     httpserver.py-20061012142527-m1yxdj1xazsf8d7s-1
  bzrlib/tests/test_http.py      testhttp.py-20051018020158-b2eef6e867c514d9
  bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
  bzrlib/transport/http/_pycurl.py pycurlhttp.py-20060110060940-4e2a705911af77a6
  bzrlib/transport/http/_urllib.py _urlgrabber.py-20060113083826-0bbf7d992fbf090c
  bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
  bzrlib/transport/http/response.py _response.py-20060613154423-a2ci7hd4iw5c7fnt-1
    ------------------------------------------------------------
    revno: 3059.1.1
    revision-id:v.ladeuil+lp at free.fr-20071130174841-6yitg1imn504sttr
    parent: pqm at pqm.ubuntu.com-20071130083301-5zq7705t6xa7yikn
    parent: v.ladeuil+lp at free.fr-20071130164824-4hhlwfudheafsf9e
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: trunk
    timestamp: Fri 2007-11-30 18:48:41 +0100
    message:
      Fix bug #172701 by catching the pycurl ShortReadvError and allowing readv to issue several get requests
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/debug.py                debug.py-20061102062349-vdhrw9qdpck8cl35-1
      bzrlib/help_topics.py          help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
      bzrlib/tests/HTTPTestUtil.py   HTTPTestUtil.py-20050914180604-247d3aafb7a43343
      bzrlib/tests/HttpServer.py     httpserver.py-20061012142527-m1yxdj1xazsf8d7s-1
      bzrlib/tests/test_http.py      testhttp.py-20051018020158-b2eef6e867c514d9
      bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
      bzrlib/transport/http/_pycurl.py pycurlhttp.py-20060110060940-4e2a705911af77a6
      bzrlib/transport/http/_urllib.py _urlgrabber.py-20060113083826-0bbf7d992fbf090c
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
      bzrlib/transport/http/response.py _response.py-20060613154423-a2ci7hd4iw5c7fnt-1
    ------------------------------------------------------------
    revno: 3052.3.4
    revision-id:v.ladeuil+lp at free.fr-20071130164824-4hhlwfudheafsf9e
    parent: v.ladeuil+lp at free.fr-20071130095122-6xz845lluzjp7tvs
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 172701
    timestamp: Fri 2007-11-30 17:48:24 +0100
    message:
      Jam's review feeback.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/debug.py                debug.py-20061102062349-vdhrw9qdpck8cl35-1
      bzrlib/tests/test_http.py      testhttp.py-20051018020158-b2eef6e867c514d9
    ------------------------------------------------------------
    revno: 3052.3.3
    revision-id:v.ladeuil+lp at free.fr-20071130095122-6xz845lluzjp7tvs
    parent: v.ladeuil+lp at free.fr-20071129232201-thjmmlgo5ucbrwfn
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 172701
    timestamp: Fri 2007-11-30 10:51:22 +0100
    message:
      Add -Dhttp support.
      
      * bzrlib/transport/http/_urllib2_wrappers.py:
      (HTTPConnection.__init__): Report the host we are about to connect
      to if -Dhttp is used.
      (AbstractHTTPHandler.do_open): Report requests and
      responses (including headers) if -Dhttp is used.
      
      * bzrlib/transport/http/_urllib.py: Fix some imports.
      (HttpTransport_urllib._perform): Delete one mutter call since
      -Dhttp provides better information.
      
      * bzrlib/transport/http/_pycurl.py:
      Fix some imports.
      (PyCurlTransport._set_curl_options): Activate verbose output if
      -Dhttp is used. Unfortunately this goes straight to stderr instead
      of .bzr.log (libcurl provides an option but pycurl does not
      implement it), but since we are debugging, I think it's
      acceptable.
      
      * bzrlib/transport/http/__init__.py:
      (HttpTransportBase._coalesce_readv): Add a comment about the
      servers that return the whole file ignoring the Ranges header.
      
      * bzrlib/help_topics.py:
      (_global_options): Add http.
      
      * bzrlib/debug.py: 
      Add 'http'.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/debug.py                debug.py-20061102062349-vdhrw9qdpck8cl35-1
      bzrlib/help_topics.py          help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
      bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
      bzrlib/transport/http/_pycurl.py pycurlhttp.py-20060110060940-4e2a705911af77a6
      bzrlib/transport/http/_urllib.py _urlgrabber.py-20060113083826-0bbf7d992fbf090c
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3052.3.2
    revision-id:v.ladeuil+lp at free.fr-20071129232201-thjmmlgo5ucbrwfn
    parent: v.ladeuil+lp at free.fr-20071129215945-qwzjfzz6kygljh4r
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 172701
    timestamp: Fri 2007-11-30 00:22:01 +0100
    message:
      Add tests and fix trivial bugs and other typos.
      
      * bzrlib/transport/http/response.py:
      (handle_response): Delete obsolete TODO.
      
      * bzrlib/transport/http/__init__.py:
      (HttpTransportBase._readv): Fix some trivial bugs revealed by the
      test suite.
      (HttpTransportBase._coalesce_readv): Fix list splice access, first
      seen by jam, but found by the test suite too.
      
      * bzrlib/tests/test_http.py:
      Fix some imports.
      (TestRangeRequestServer.test_readv_multiple_get_requests):
      Exercise the rewritten readv.
      
      * bzrlib/tests/HttpServer.py:
      (TestingHTTPRequestHandler.do_GET): Count the requests (more tests
      use that info).
      
      * bzrlib/tests/HTTPTestUtil.py:
      (LimitedRangeRequestHandler.do_GET): The statistics are collected
      in the base request handler now.
      (NoRangeRequestHandler.do_GET): But since we bypass the base
      handler, re-enable the statistics collection here.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/tests/HTTPTestUtil.py   HTTPTestUtil.py-20050914180604-247d3aafb7a43343
      bzrlib/tests/HttpServer.py     httpserver.py-20061012142527-m1yxdj1xazsf8d7s-1
      bzrlib/tests/test_http.py      testhttp.py-20051018020158-b2eef6e867c514d9
      bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
      bzrlib/transport/http/response.py _response.py-20060613154423-a2ci7hd4iw5c7fnt-1
    ------------------------------------------------------------
    revno: 3052.3.1
    revision-id:v.ladeuil+lp at free.fr-20071129215945-qwzjfzz6kygljh4r
    parent: pqm at pqm.ubuntu.com-20071129184101-u9506rihe4zbzyyz
    parent: v.ladeuil+lp at free.fr-20071129154333-ddra1mbf487rlofw
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 172701
    timestamp: Thu 2007-11-29 22:59:45 +0100
    message:
      Merge http readv issuing multiple requests since that fix bug #172701
    modified:
      bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3024.2.3
    revision-id:v.ladeuil+lp at free.fr-20071129154333-ddra1mbf487rlofw
    parent: v.ladeuil+lp at free.fr-20071127093537-gruuxzmso2r6c5pg
    committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
    branch nick: 165061
    timestamp: Thu 2007-11-29 16:43:33 +0100
    message:
      Rewrite http_readv to allow several GET requests. Smoke tested against branch reported in the bug.
      
      * bzrlib/transport/http/__init__.py:
      (HttpTransportBase._readv): Issue several GET requests if too many
      ranges are requested.
    modified:
      bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
=== modified file 'NEWS'
--- a/NEWS	2007-11-30 08:33:01 +0000
+++ b/NEWS	2007-11-30 17:48:41 +0000
@@ -101,6 +101,11 @@
      installer write to the registry (HKLM\SOFTWARE\Bazaar) useful info 
      about paths and bzr version. (Alexander Belchenko, #129298)
 
+  INTERNALS:
+
+    * New -Dhttp debug option reports http connections, requests and responses.
+      (Vincent Ladeuil)
+
   DOCUMENTATION:
 
   BUG FIXES:
@@ -207,6 +212,11 @@
    * Revert doesn't crash when restoring a single file from a deleted
      directory. (Aaron Bentley)
 
+   * Catch ShortReadvErrors while using pycurl. Also make readv more robust by
+     allowing multiple GET requests to be issued if too many ranges are
+     required.
+     (Vincent Ladeuil, #172701)
+
    * Stderr output via logging mechanism now goes through encoded wrapper
      and no more uses utf-8, but terminal encoding instead. So all unicode
      strings now should be readable in non-utf-8 terminal.

=== modified file 'bzrlib/debug.py'
--- a/bzrlib/debug.py	2007-11-04 15:29:17 +0000
+++ b/bzrlib/debug.py	2007-11-30 16:48:24 +0000
@@ -31,6 +31,7 @@
  * fetch - trace history copying between repositories
  * hooks - trace hook execution
  * hpss - trace smart protocol requests and responses
+ * http - trace http connections, requests and responses
  * index - trace major index operations
  * lock - trace when lockdir locks are taken or released
 

=== modified file 'bzrlib/help_topics.py'
--- a/bzrlib/help_topics.py	2007-11-06 07:01:07 +0000
+++ b/bzrlib/help_topics.py	2007-11-30 09:51:22 +0000
@@ -268,6 +268,7 @@
                operations.
 -Dhashcache    Log every time a working file is read to determine its hash.
 -Dhooks        Trace hook execution.
+-Dhttp         Trace http connections, requests and responses
 -Dhpss         Trace smart protocol requests and responses.
 -Dindex        Trace major index operations.
 -Dlock         Trace when lockdir locks are taken or released.

=== modified file 'bzrlib/tests/HTTPTestUtil.py'
--- a/bzrlib/tests/HTTPTestUtil.py	2007-10-17 15:36:20 +0000
+++ b/bzrlib/tests/HTTPTestUtil.py	2007-11-29 23:22:01 +0000
@@ -164,11 +164,6 @@
         return TestingHTTPRequestHandler.get_multiple_ranges(self, file,
                                                              file_size, ranges)
 
-    def do_GET(self):
-        tcs = self.server.test_case_server
-        tcs.GET_request_nb += 1
-        return TestingHTTPRequestHandler.do_GET(self)
-
 
 class LimitedRangeHTTPServer(HttpServer):
     """An HttpServer erroring out on requests with too much range specifiers"""
@@ -177,7 +172,6 @@
                  range_limit=None):
         HttpServer.__init__(self, request_handler)
         self.range_limit = range_limit
-        self.GET_request_nb = 0
 
 
 class SingleRangeRequestHandler(TestingHTTPRequestHandler):
@@ -208,8 +202,11 @@
 class NoRangeRequestHandler(TestingHTTPRequestHandler):
     """Ignore range requests without notice"""
 
-    # Just bypass the range handling done by TestingHTTPRequestHandler
-    do_GET = SimpleHTTPRequestHandler.do_GET
+    def do_GET(self):
+        # Update the statistics
+        self.server.test_case_server.GET_request_nb += 1
+        # Just bypass the range handling done by TestingHTTPRequestHandler
+        return SimpleHTTPRequestHandler.do_GET(self)
 
 
 class TestCaseWithWebserver(TestCaseWithTransport):

=== modified file 'bzrlib/tests/HttpServer.py'
--- a/bzrlib/tests/HttpServer.py	2007-10-31 12:42:42 +0000
+++ b/bzrlib/tests/HttpServer.py	2007-11-29 23:22:01 +0000
@@ -149,6 +149,8 @@
 
         Handles the Range header.
         """
+        # Update statistics
+        self.server.test_case_server.GET_request_nb += 1
 
         path = self.translate_path(self.path)
         ranges_header_value = self.headers.get('Range')
@@ -304,6 +306,8 @@
         self.host = 'localhost'
         self.port = 0
         self._httpd = None
+        # Allows tests to verify number of GET requests issued
+        self.GET_request_nb = 0
 
     def _get_httpd(self):
         if self._httpd is None:

=== modified file 'bzrlib/tests/test_http.py'
--- a/bzrlib/tests/test_http.py	2007-11-04 15:29:17 +0000
+++ b/bzrlib/tests/test_http.py	2007-11-30 16:48:24 +0000
@@ -32,15 +32,10 @@
     config,
     errors,
     osutils,
+    tests,
     ui,
     urlutils,
     )
-from bzrlib.tests import (
-    TestCase,
-    TestUIFactory,
-    TestSkipped,
-    StringIOWrapper,
-    )
 from bzrlib.tests.HttpServer import (
     HttpServer,
     HttpServer_PyCurl,
@@ -158,12 +153,12 @@
             from bzrlib.transport.http._pycurl import PyCurlTransport
             return PyCurlTransport
         except errors.DependencyNotPresent:
-            raise TestSkipped('pycurl not present')
+            raise tests.TestSkipped('pycurl not present')
 
     _transport = property(_get_pycurl_maybe)
 
 
-class TestHttpUrls(TestCase):
+class TestHttpUrls(tests.TestCase):
 
     # TODO: This should be moved to authorization tests once they
     # are written.
@@ -226,7 +221,7 @@
             server.tearDown()
 
 
-class TestHttpUrls_urllib(TestHttpTransportUrls, TestCase):
+class TestHttpUrls_urllib(TestHttpTransportUrls, tests.TestCase):
     """Test http urls with urllib"""
 
     _transport = HttpTransport_urllib
@@ -235,7 +230,7 @@
 
 
 class TestHttpUrls_pycurl(TestWithTransport_pycurl, TestHttpTransportUrls,
-                          TestCase):
+                          tests.TestCase):
     """Test http urls with pycurl"""
 
     _server = HttpServer_PyCurl
@@ -254,7 +249,7 @@
         try:
             import pycurl
         except ImportError:
-            raise TestSkipped('pycurl not present')
+            raise tests.TestSkipped('pycurl not present')
         # Now that we have pycurl imported, we can fake its version_info
         # This was taken from a windows pycurl without SSL
         # (thanks to bialix)
@@ -352,7 +347,7 @@
     """Test http connections with pycurl"""
 
 
-class TestHttpTransportRegistration(TestCase):
+class TestHttpTransportRegistration(tests.TestCase):
     """Test registrations of various http implementations"""
 
     def test_http_registered(self):
@@ -372,7 +367,7 @@
         try:
             http_transport = get_transport(url)
         except errors.UnsupportedProtocol:
-            raise TestSkipped('%s not available' % scheme)
+            raise tests.TestSkipped('%s not available' % scheme)
         code, response = http_transport._post('abc def end-of-body')
         self.assertTrue(
             server.received_bytes.startswith('POST /.bzr/smart HTTP/1.'))
@@ -384,7 +379,7 @@
             server.received_bytes.endswith('\r\n\r\nabc def end-of-body'))
 
 
-class TestPost_urllib(TestCase, TestPost):
+class TestPost_urllib(tests.TestCase, TestPost):
     """TestPost for urllib implementation"""
 
     _transport = HttpTransport_urllib
@@ -393,14 +388,14 @@
         self._test_post_body_is_received('http+urllib')
 
 
-class TestPost_pycurl(TestWithTransport_pycurl, TestCase, TestPost):
+class TestPost_pycurl(TestWithTransport_pycurl, tests.TestCase, TestPost):
     """TestPost for pycurl implementation"""
 
     def test_post_body_is_received_pycurl(self):
         self._test_post_body_is_received('http+pycurl')
 
 
-class TestRangeHeader(TestCase):
+class TestRangeHeader(tests.TestCase):
     """Test range_header method"""
 
     def check_header(self, value, ranges=[], tail=0):
@@ -575,7 +570,7 @@
     """Tests forbidden server for pycurl implementation"""
 
 
-class TestRecordingServer(TestCase):
+class TestRecordingServer(tests.TestCase):
 
     def test_create(self):
         server = RecordingServer(expect_body_tail=None)
@@ -653,6 +648,20 @@
         self.assertListRaises((errors.InvalidRange, errors.ShortReadvError,),
                               t.readv, 'a', [(12,2)])
 
+    def test_readv_multiple_get_requests(self):
+        server = self.get_readonly_server()
+        t = self._transport(server.get_url())
+        # force transport to issue multiple requests
+        t._max_readv_combine = 1
+        t._max_get_ranges = 1
+        l = list(t.readv('a', ((0, 1), (1, 1), (3, 2), (9, 1))))
+        self.assertEqual(l[0], (0, '0'))
+        self.assertEqual(l[1], (1, '1'))
+        self.assertEqual(l[2], (3, '34'))
+        self.assertEqual(l[3], (9, '9'))
+        # The server should have issued 4 requests
+        self.assertEqual(4, self.get_readonly_server().GET_request_nb)
+
 
 class TestSingleRangeRequestServer(TestRangeRequestServer):
     """Test readv against a server which accept only single range requests"""
@@ -709,8 +718,8 @@
 
 
 class TestNoRangeRequestServer_pycurl(TestWithTransport_pycurl,
-                               TestNoRangeRequestServer,
-                               TestCaseWithWebserver):
+                                      TestNoRangeRequestServer,
+                                      TestCaseWithWebserver):
     """Tests range requests refusing server for pycurl implementation"""
 
 
@@ -738,7 +747,7 @@
         TestCaseWithWebserver.setUp(self)
         # We need to manipulate ranges that correspond to real chunks in the
         # response, so we build a content appropriately.
-        filler = ''.join(['abcdefghij' for _ in range(102)])
+        filler = ''.join(['abcdefghij' for x in range(102)])
         content = ''.join(['%04d' % v + filler for v in range(16)])
         self.build_tree_contents([('a', content)],)
 
@@ -749,7 +758,7 @@
         self.assertEqual(l[1], (1024, '0001'))
         self.assertEqual(1, self.get_readonly_server().GET_request_nb)
 
-    def test_a_lot_of_ranges(self):
+    def test_more_ranges(self):
         t = self.get_transport()
         l = list(t.readv('a', ((0, 4), (1024, 4), (4096, 4), (8192, 4))))
         self.assertEqual(l[0], (0, '0000'))
@@ -775,14 +784,14 @@
 
 
 
-class TestHttpProxyWhiteBox(TestCase):
+class TestHttpProxyWhiteBox(tests.TestCase):
     """Whitebox test proxy http authorization.
 
     Only the urllib implementation is tested here.
     """
 
     def setUp(self):
-        TestCase.setUp(self)
+        tests.TestCase.setUp(self)
         self._old_env = {}
 
     def tearDown(self):
@@ -933,12 +942,12 @@
         # pycurl does not check HTTP_PROXY for security reasons
         # (for use in a CGI context that we do not care
         # about. Should we ?)
-        raise TestSkipped('pycurl does not check HTTP_PROXY '
-            'for security reasons')
+        raise tests.TestNotApplicable(
+            'pycurl does not check HTTP_PROXY for security reasons')
 
     def test_HTTP_PROXY_with_NO_PROXY(self):
-        raise TestSkipped('pycurl does not check HTTP_PROXY '
-            'for security reasons')
+        raise tests.TestNotApplicable(
+            'pycurl does not check HTTP_PROXY for security reasons')
 
     def test_http_proxy_without_scheme(self):
         # pycurl *ignores* invalid proxy env variables. If that
@@ -1272,8 +1281,8 @@
     def test_prompt_for_password(self):
         self.server.add_user('joe', 'foo')
         t = self.get_user_transport('joe', None)
-        stdout = StringIOWrapper()
-        ui.ui_factory = TestUIFactory(stdin='foo\n', stdout=stdout)
+        stdout = tests.StringIOWrapper()
+        ui.ui_factory = tests.TestUIFactory(stdin='foo\n', stdout=stdout)
         self.assertEqual('contents of a\n',t.get('a').read())
         # stdin should be empty
         self.assertEqual('', ui.ui_factory.stdin.readline())
@@ -1302,8 +1311,8 @@
         stdin_content = 'bar\n'  # Not the right password
         self.server.add_user(user, password)
         t = self.get_user_transport(user, None)
-        ui.ui_factory = TestUIFactory(stdin=stdin_content,
-                                      stdout=StringIOWrapper())
+        ui.ui_factory = tests.TestUIFactory(stdin=stdin_content,
+                                            stdout=tests.StringIOWrapper())
         # Create a minimal config file with the right password
         conf = config.AuthenticationConfig()
         conf._get_config().update(

=== modified file 'bzrlib/transport/http/__init__.py'
--- a/bzrlib/transport/http/__init__.py	2007-11-27 08:26:00 +0000
+++ b/bzrlib/transport/http/__init__.py	2007-11-30 09:51:22 +0000
@@ -208,9 +208,9 @@
             self._range_hint = None
             mutter('Retry "%s" without ranges' % relpath)
         else:
-            # We tried all the tricks, but nothing worked. We re-raise original
-            # exception; the 'mutter' calls above will indicate that further
-            # tries were unsuccessful
+            # We tried all the tricks, but nothing worked. We re-raise the
+            # original exception; the 'mutter' calls above will indicate that
+            # further tries were unsuccessful
             raise exc_info[0], exc_info[1], exc_info[2]
 
     def _get_ranges_hinted(self, relpath, ranges):
@@ -251,6 +251,9 @@
     # No limit on the offset number that get combined into one, we are trying
     # to avoid downloading the whole file.
     _max_readv_combine = 0
+    # By default Apache has a limit of ~400 ranges before replying with a 400
+    # Bad Request. So we go underneath that amount to be safe.
+    _max_get_ranges = 200
 
     def _readv(self, relpath, offsets):
         """Get parts of the file at the given relative path.
@@ -258,41 +261,84 @@
         :param offsets: A list of (offset, size) tuples.
         :param return: A list or generator of (offset, data) tuples
         """
-        sorted_offsets = sorted(list(offsets))
-        fudge = self._bytes_to_read_before_seek
-        coalesced = self._coalesce_offsets(sorted_offsets,
-                                           limit=self._max_readv_combine,
-                                           fudge_factor=fudge)
-        coalesced = list(coalesced)
-        mutter('http readv of %s  offsets => %s collapsed %s',
-                relpath, len(offsets), len(coalesced))
-
-        f = self._get_ranges_hinted(relpath, coalesced)
-        for start, size in offsets:
-            try_again = True
-            while try_again:
-                try_again = False
-                f.seek(start, ((start < 0) and 2) or 0)
-                start = f.tell()
-                try:
-                    data = f.read(size)
-                    if len(data) != size:
-                        raise errors.ShortReadvError(relpath, start, size,
-                                                     actual=len(data))
-                except errors.ShortReadvError, e:
-                    self._degrade_range_hint(relpath, coalesced, sys.exc_info())
-
-                    # Since the offsets and the ranges may not be in the same
-                    # order, we don't try to calculate a restricted single
-                    # range encompassing unprocessed offsets.
-
-                    # Note: we replace 'f' here, it may need cleaning one day
-                    # before being thrown that way.
-                    f = self._get_ranges_hinted(relpath, coalesced)
-                    try_again = True
-
-            # After one or more tries, we get the data.
-            yield start, data
+
+        # offsets may be a genarator, we will iterate it several times, so
+        # build a list
+        offsets = list(offsets)
+
+        try_again = True
+        while try_again:
+            try_again = False
+
+            # Coalesce the offsets to minimize the GET requests issued
+            sorted_offsets = sorted(offsets)
+            coalesced = self._coalesce_offsets(
+                sorted_offsets, limit=self._max_readv_combine,
+                fudge_factor=self._bytes_to_read_before_seek)
+
+            # Turn it into a list, we will iterate it several times
+            coalesced = list(coalesced)
+            mutter('http readv of %s  offsets => %s collapsed %s',
+                    relpath, len(offsets), len(coalesced))
+
+            # Cache the data read, but only until it's been used
+            data_map = {}
+            # We will iterate on the data received from the GET requests and
+            # serve the corresponding offsets repecting the initial order. We
+            # need an offset iterator for that.
+            iter_offsets = iter(offsets)
+            cur_offset_and_size = iter_offsets.next()
+
+            try:
+                for cur_coal, file in self._coalesce_readv(relpath, coalesced):
+                    # Split the received chunk
+                    for offset, size in cur_coal.ranges:
+                        start = cur_coal.start + offset
+                        file.seek(start, 0)
+                        data = file.read(size)
+                        data_len = len(data)
+                        if data_len != size:
+                            raise errors.ShortReadvError(relpath, start, size,
+                                                         actual=data_len)
+                        data_map[(start, size)] = data
+
+                    # Yield everything we can
+                    while cur_offset_and_size in data_map:
+                        # Clean the cached data since we use it
+                        # XXX: will break if offsets contains duplicates --
+                        # vila20071129
+                        this_data = data_map.pop(cur_offset_and_size)
+                        yield cur_offset_and_size[0], this_data
+                        cur_offset_and_size = iter_offsets.next()
+
+            except (errors.ShortReadvError,errors.InvalidRange), e:
+                self._degrade_range_hint(relpath, coalesced, sys.exc_info())
+                # Some offsets may have been already processed, so we retry
+                # only the unsuccessful ones.
+                offsets = [cur_offset_and_size] + [o for o in iter_offsets]
+                try_again = True
+
+    def _coalesce_readv(self, relpath, coalesced):
+        """Issue several GET requests to satisfy the coalesced offsets"""
+        total = len(coalesced)
+        if self._range_hint == 'multi':
+             max_ranges = self._max_get_ranges
+        elif self._range_hint == 'single':
+             max_ranges = total
+        else:
+            # The whole file will be downloaded anyway
+            max_ranges = total
+        # TODO: Some web servers may ignore the range requests and return the
+        # whole file, we may want to detect that and avoid further requests.
+        # Hint: test_readv_multiple_get_requests will fail in that case .
+        for group in xrange(0, len(coalesced), max_ranges):
+            ranges = coalesced[group:group+max_ranges]
+            # Note that the following may raise errors.InvalidRange. It's the
+            # caller responsability to decide how to retry since it may provide
+            # different coalesced offsets.
+            code, file = self._get(relpath, ranges)
+            for range in ranges:
+                yield range, file
 
     def recommended_page_size(self):
         """See Transport.recommended_page_size().
@@ -447,7 +493,7 @@
         """Prepare a HTTP Range header at a level the server should accept"""
 
         if self._range_hint == 'multi':
-            # Nothing to do here
+            # Generate the header describing all offsets
             return self._range_header(offsets, tail_amount)
         elif self._range_hint == 'single':
             # Combine all the requested ranges into a single

=== modified file 'bzrlib/transport/http/_pycurl.py'
--- a/bzrlib/transport/http/_pycurl.py	2007-10-31 12:38:11 +0000
+++ b/bzrlib/transport/http/_pycurl.py	2007-11-30 09:51:22 +0000
@@ -36,13 +36,12 @@
 import sys
 
 from bzrlib import (
+    debug,
     errors,
+    trace,
     __version__ as bzrlib_version,
     )
 import bzrlib
-from bzrlib.errors import (NoSuchFile,
-                           ConnectionError,
-                           DependencyNotPresent)
 from bzrlib.trace import mutter
 from bzrlib.transport.http import (
     ca_bundle,
@@ -55,7 +54,7 @@
     import pycurl
 except ImportError, e:
     mutter("failed to import pycurl: %s", e)
-    raise DependencyNotPresent('pycurl', e)
+    raise errors.DependencyNotPresent('pycurl', e)
 
 try:
     # see if we can actually initialize PyCurl - sometimes it will load but
@@ -70,7 +69,7 @@
     pycurl.Curl()
 except pycurl.error, e:
     mutter("failed to initialize pycurl: %s", e)
-    raise DependencyNotPresent('pycurl', e)
+    raise errors.DependencyNotPresent('pycurl', e)
 
 
 
@@ -112,7 +111,7 @@
             # protocols
             supported = pycurl.version_info()[8]
             if 'https' not in supported:
-                raise DependencyNotPresent('pycurl', 'no https support')
+                raise errors.DependencyNotPresent('pycurl', 'no https support')
         self.cabundle = ca_bundle.get_ca_path()
 
     def _get_curl(self):
@@ -202,7 +201,7 @@
         data.seek(0)
 
         if code == 404:
-            raise NoSuchFile(abspath)
+            raise errors.NoSuchFile(abspath)
         if code != 200:
             self._raise_curl_http_error(
                 curl, 'expected 200 or 404 for full response.')
@@ -264,9 +263,12 @@
 
     def _set_curl_options(self, curl):
         """Set options for all requests"""
-        ## curl.setopt(pycurl.VERBOSE, 1)
-        # TODO: maybe include a summary of the pycurl version
-        ua_str = 'bzr/%s (pycurl)' % (bzrlib.__version__,)
+        if 'http' in debug.debug_flags:
+            curl.setopt(pycurl.VERBOSE, 1)
+            # pycurl doesn't implement the CURLOPT_STDERR option, so we can't
+            # do : curl.setopt(pycurl.STDERR, trace._trace_file)
+
+        ua_str = 'bzr/%s (pycurl: %s)' % (bzrlib.__version__, pycurl.version)
         curl.setopt(pycurl.USERAGENT, ua_str)
         if self.cabundle:
             curl.setopt(pycurl.CAINFO, self.cabundle)
@@ -292,8 +294,8 @@
                         CURLE_COULDNT_CONNECT,
                         CURLE_GOT_NOTHING,
                         CURLE_COULDNT_RESOLVE_PROXY,):
-                raise ConnectionError('curl connection error (%s)\non %s'
-                              % (e[1], url))
+                raise errors.ConnectionError(
+                    'curl connection error (%s)\non %s' % (e[1], url))
             elif e[0] == CURLE_PARTIAL_FILE:
                 # Pycurl itself has detected a short read.  We do
                 # not have all the information for the

=== modified file 'bzrlib/transport/http/_urllib.py'
--- a/bzrlib/transport/http/_urllib.py	2007-11-04 15:44:32 +0000
+++ b/bzrlib/transport/http/_urllib.py	2007-11-30 09:51:22 +0000
@@ -20,10 +20,10 @@
 
 from bzrlib import (
     errors,
+    trace,
     urlutils,
     )
-from bzrlib.trace import mutter
-from bzrlib.transport.http import HttpTransportBase
+from bzrlib.transport import http
 # TODO: handle_response should be integrated into the _urllib2_wrappers
 from bzrlib.transport.http.response import handle_response
 from bzrlib.transport.http._urllib2_wrappers import (
@@ -32,7 +32,7 @@
     )
 
 
-class HttpTransport_urllib(HttpTransportBase):
+class HttpTransport_urllib(http.HttpTransportBase):
     """Python urllib transport for http and https."""
 
     # In order to debug we have to issue our traces in sync with
@@ -85,7 +85,6 @@
         request.auth = auth
         request.proxy_auth = proxy_auth
 
-        mutter('%s: [%s]' % (request.method, request.get_full_url()))
         if self._debuglevel > 0:
             print 'perform: %s base: %s, url: %s' % (request.method, self.base,
                                                      request.get_full_url())
@@ -108,8 +107,8 @@
                                            qual_proto=self._scheme)
 
         if request.redirected_to is not None:
-            mutter('redirected from: %s to: %s' % (request.get_full_url(),
-                                                   request.redirected_to))
+            trace.mutter('redirected from: %s to: %s' % (request.get_full_url(),
+                                                         request.redirected_to))
 
         return response
 

=== modified file 'bzrlib/transport/http/_urllib2_wrappers.py'
--- a/bzrlib/transport/http/_urllib2_wrappers.py	2007-11-05 13:21:30 +0000
+++ b/bzrlib/transport/http/_urllib2_wrappers.py	2007-11-30 09:51:22 +0000
@@ -32,6 +32,9 @@
 handle authentication schemes.
 """
 
+# TODO: now that we have -Dhttp most of the needs should be covered in a more
+# accessible way (i.e. no need to edit the source), if experience confirms
+# that, delete all DEBUG uses -- vila20071130 (happy birthday).
 DEBUG = 0
 
 # FIXME: Oversimplifying, two kind of exceptions should be
@@ -56,7 +59,9 @@
 from bzrlib import __version__ as bzrlib_version
 from bzrlib import (
     config,
+    debug,
     errors,
+    trace,
     transport,
     ui,
     )
@@ -101,7 +106,8 @@
                 # having issued the response headers (even if the
                 # headers indicate a Content-Type...)
                 body = self.fp.read(self.length)
-                if self.debuglevel > 0:
+                if self.debuglevel > 3:
+                    # This one can be huge and is generally not interesting
                     print "Consumed body: [%s]" % body
             self.close()
         elif self.status == 200:
@@ -143,6 +149,13 @@
 
     # XXX: Needs refactoring at the caller level.
     def __init__(self, host, port=None, strict=None, proxied_host=None):
+        if 'http' in debug.debug_flags:
+            netloc = host
+            if port is not None:
+                netloc += '%d' % port
+            if proxied_host is not None:
+                netloc += '(proxy for %s)' % proxied_host
+            trace.mutter('* About to connect() to %s' % netloc)
         httplib.HTTPConnection.__init__(self, host, port, strict)
         self.proxied_host = proxied_host
 
@@ -425,12 +438,17 @@
         headers.update(request.unredirected_hdrs)
 
         try:
-            connection._send_request(request.get_method(),
-                                     request.get_selector(),
+            method = request.get_method()
+            url = request.get_selector()
+            connection._send_request(method, url,
                                      # FIXME: implements 100-continue
                                      #None, # We don't send the body yet
                                      request.get_data(),
                                      headers)
+            if 'http' in debug.debug_flags:
+                trace.mutter('> %s %s' % (method, url))
+                hdrs = ['%s: %s' % (k, v) for k,v in headers.items()]
+                trace.mutter('> ' + '\n> '.join(hdrs) + '\n')
             if self._debuglevel > 0:
                 print 'Request sent: [%r]' % request
             response = connection.getresponse()
@@ -453,6 +471,17 @@
 #            connection.send(body)
 #            response = connection.getresponse()
 
+        if 'http' in debug.debug_flags:
+            version = 'HTTP/%d.%d'
+            try:
+                version = version % (response.version / 10,
+                                     response.version % 10)
+            except:
+                version = 'HTTP/%r' % version
+            trace.mutter('< %s %s %s' % (version, response.status,
+                                            response.reason))
+            hdrs = [h.rstrip('\r\n') for h in response.msg.headers]
+            trace.mutter('< ' + '\n< '.join(hdrs) + '\n')
         if self._debuglevel > 0:
             print 'Receives response: %r' % response
             print '  For: %r(%r)' % (request.get_method(),
@@ -1285,7 +1314,7 @@
             )
 
         self.open = self._opener.open
-        if DEBUG >= 2:
+        if DEBUG >= 3:
             # When dealing with handler order, it's easy to mess
             # things up, the following will help understand which
             # handler is used, when and for what.

=== modified file 'bzrlib/transport/http/response.py'
--- a/bzrlib/transport/http/response.py	2007-06-20 13:56:21 +0000
+++ b/bzrlib/transport/http/response.py	2007-11-29 23:22:01 +0000
@@ -287,9 +287,6 @@
         # magic value.
         raise errors.InvalidRange(url,0)
 
-    # TODO: jam 20060713 Properly handle redirects (302 Found, etc)
-    #       The '_get' code says to follow redirects, we probably 
-    #       should actually handle the return values
     else:
         raise errors.InvalidHttpResponse(url, "Unknown response code %s" 
                                               % (code,))




More information about the bazaar-commits mailing list