Rev 3060: Fix bug #172701 by catching the pycurl ShortReadvError and allowing readv to issue several get requests in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Fri Nov 30 18:21:11 GMT 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 3060
revision-id:pqm at pqm.ubuntu.com-20071130182102-i0t564k01anm7uk2
parent: pqm at pqm.ubuntu.com-20071130083301-5zq7705t6xa7yikn
parent: v.ladeuil+lp at free.fr-20071130174841-6yitg1imn504sttr
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2007-11-30 18:21:02 +0000
message:
Fix bug #172701 by catching the pycurl ShortReadvError and allowing readv to issue several get requests
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/debug.py debug.py-20061102062349-vdhrw9qdpck8cl35-1
bzrlib/help_topics.py help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
bzrlib/tests/HTTPTestUtil.py HTTPTestUtil.py-20050914180604-247d3aafb7a43343
bzrlib/tests/HttpServer.py httpserver.py-20061012142527-m1yxdj1xazsf8d7s-1
bzrlib/tests/test_http.py testhttp.py-20051018020158-b2eef6e867c514d9
bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
bzrlib/transport/http/_pycurl.py pycurlhttp.py-20060110060940-4e2a705911af77a6
bzrlib/transport/http/_urllib.py _urlgrabber.py-20060113083826-0bbf7d992fbf090c
bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
bzrlib/transport/http/response.py _response.py-20060613154423-a2ci7hd4iw5c7fnt-1
------------------------------------------------------------
revno: 3059.1.1
revision-id:v.ladeuil+lp at free.fr-20071130174841-6yitg1imn504sttr
parent: pqm at pqm.ubuntu.com-20071130083301-5zq7705t6xa7yikn
parent: v.ladeuil+lp at free.fr-20071130164824-4hhlwfudheafsf9e
committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
branch nick: trunk
timestamp: Fri 2007-11-30 18:48:41 +0100
message:
Fix bug #172701 by catching the pycurl ShortReadvError and allowing readv to issue several get requests
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/debug.py debug.py-20061102062349-vdhrw9qdpck8cl35-1
bzrlib/help_topics.py help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
bzrlib/tests/HTTPTestUtil.py HTTPTestUtil.py-20050914180604-247d3aafb7a43343
bzrlib/tests/HttpServer.py httpserver.py-20061012142527-m1yxdj1xazsf8d7s-1
bzrlib/tests/test_http.py testhttp.py-20051018020158-b2eef6e867c514d9
bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
bzrlib/transport/http/_pycurl.py pycurlhttp.py-20060110060940-4e2a705911af77a6
bzrlib/transport/http/_urllib.py _urlgrabber.py-20060113083826-0bbf7d992fbf090c
bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
bzrlib/transport/http/response.py _response.py-20060613154423-a2ci7hd4iw5c7fnt-1
------------------------------------------------------------
revno: 3052.3.4
revision-id:v.ladeuil+lp at free.fr-20071130164824-4hhlwfudheafsf9e
parent: v.ladeuil+lp at free.fr-20071130095122-6xz845lluzjp7tvs
committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
branch nick: 172701
timestamp: Fri 2007-11-30 17:48:24 +0100
message:
Jam's review feeback.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/debug.py debug.py-20061102062349-vdhrw9qdpck8cl35-1
bzrlib/tests/test_http.py testhttp.py-20051018020158-b2eef6e867c514d9
------------------------------------------------------------
revno: 3052.3.3
revision-id:v.ladeuil+lp at free.fr-20071130095122-6xz845lluzjp7tvs
parent: v.ladeuil+lp at free.fr-20071129232201-thjmmlgo5ucbrwfn
committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
branch nick: 172701
timestamp: Fri 2007-11-30 10:51:22 +0100
message:
Add -Dhttp support.
* bzrlib/transport/http/_urllib2_wrappers.py:
(HTTPConnection.__init__): Report the host we are about to connect
to if -Dhttp is used.
(AbstractHTTPHandler.do_open): Report requests and
responses (including headers) if -Dhttp is used.
* bzrlib/transport/http/_urllib.py: Fix some imports.
(HttpTransport_urllib._perform): Delete one mutter call since
-Dhttp provides better information.
* bzrlib/transport/http/_pycurl.py:
Fix some imports.
(PyCurlTransport._set_curl_options): Activate verbose output if
-Dhttp is used. Unfortunately this goes straight to stderr instead
of .bzr.log (libcurl provides an option but pycurl does not
implement it), but since we are debugging, I think it's
acceptable.
* bzrlib/transport/http/__init__.py:
(HttpTransportBase._coalesce_readv): Add a comment about the
servers that return the whole file ignoring the Ranges header.
* bzrlib/help_topics.py:
(_global_options): Add http.
* bzrlib/debug.py:
Add 'http'.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/debug.py debug.py-20061102062349-vdhrw9qdpck8cl35-1
bzrlib/help_topics.py help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
bzrlib/transport/http/_pycurl.py pycurlhttp.py-20060110060940-4e2a705911af77a6
bzrlib/transport/http/_urllib.py _urlgrabber.py-20060113083826-0bbf7d992fbf090c
bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
------------------------------------------------------------
revno: 3052.3.2
revision-id:v.ladeuil+lp at free.fr-20071129232201-thjmmlgo5ucbrwfn
parent: v.ladeuil+lp at free.fr-20071129215945-qwzjfzz6kygljh4r
committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
branch nick: 172701
timestamp: Fri 2007-11-30 00:22:01 +0100
message:
Add tests and fix trivial bugs and other typos.
* bzrlib/transport/http/response.py:
(handle_response): Delete obsolete TODO.
* bzrlib/transport/http/__init__.py:
(HttpTransportBase._readv): Fix some trivial bugs revealed by the
test suite.
(HttpTransportBase._coalesce_readv): Fix list splice access, first
seen by jam, but found by the test suite too.
* bzrlib/tests/test_http.py:
Fix some imports.
(TestRangeRequestServer.test_readv_multiple_get_requests):
Exercise the rewritten readv.
* bzrlib/tests/HttpServer.py:
(TestingHTTPRequestHandler.do_GET): Count the requests (more tests
use that info).
* bzrlib/tests/HTTPTestUtil.py:
(LimitedRangeRequestHandler.do_GET): The statistics are collected
in the base request handler now.
(NoRangeRequestHandler.do_GET): But since we bypass the base
handler, re-enable the statistics collection here.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/tests/HTTPTestUtil.py HTTPTestUtil.py-20050914180604-247d3aafb7a43343
bzrlib/tests/HttpServer.py httpserver.py-20061012142527-m1yxdj1xazsf8d7s-1
bzrlib/tests/test_http.py testhttp.py-20051018020158-b2eef6e867c514d9
bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
bzrlib/transport/http/response.py _response.py-20060613154423-a2ci7hd4iw5c7fnt-1
------------------------------------------------------------
revno: 3052.3.1
revision-id:v.ladeuil+lp at free.fr-20071129215945-qwzjfzz6kygljh4r
parent: pqm at pqm.ubuntu.com-20071129184101-u9506rihe4zbzyyz
parent: v.ladeuil+lp at free.fr-20071129154333-ddra1mbf487rlofw
committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
branch nick: 172701
timestamp: Thu 2007-11-29 22:59:45 +0100
message:
Merge http readv issuing multiple requests since that fix bug #172701
modified:
bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
------------------------------------------------------------
revno: 3024.2.3
revision-id:v.ladeuil+lp at free.fr-20071129154333-ddra1mbf487rlofw
parent: v.ladeuil+lp at free.fr-20071127093537-gruuxzmso2r6c5pg
committer: Vincent Ladeuil <v.ladeuil+lp at free.fr>
branch nick: 165061
timestamp: Thu 2007-11-29 16:43:33 +0100
message:
Rewrite http_readv to allow several GET requests. Smoke tested against branch reported in the bug.
* bzrlib/transport/http/__init__.py:
(HttpTransportBase._readv): Issue several GET requests if too many
ranges are requested.
modified:
bzrlib/transport/http/__init__.py http_transport.py-20050711212304-506c5fd1059ace96
bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
=== modified file 'NEWS'
--- a/NEWS 2007-11-30 08:33:01 +0000
+++ b/NEWS 2007-11-30 17:48:41 +0000
@@ -101,6 +101,11 @@
installer write to the registry (HKLM\SOFTWARE\Bazaar) useful info
about paths and bzr version. (Alexander Belchenko, #129298)
+ INTERNALS:
+
+ * New -Dhttp debug option reports http connections, requests and responses.
+ (Vincent Ladeuil)
+
DOCUMENTATION:
BUG FIXES:
@@ -207,6 +212,11 @@
* Revert doesn't crash when restoring a single file from a deleted
directory. (Aaron Bentley)
+ * Catch ShortReadvErrors while using pycurl. Also make readv more robust by
+ allowing multiple GET requests to be issued if too many ranges are
+ required.
+ (Vincent Ladeuil, #172701)
+
* Stderr output via logging mechanism now goes through encoded wrapper
and no more uses utf-8, but terminal encoding instead. So all unicode
strings now should be readable in non-utf-8 terminal.
=== modified file 'bzrlib/debug.py'
--- a/bzrlib/debug.py 2007-11-04 15:29:17 +0000
+++ b/bzrlib/debug.py 2007-11-30 16:48:24 +0000
@@ -31,6 +31,7 @@
* fetch - trace history copying between repositories
* hooks - trace hook execution
* hpss - trace smart protocol requests and responses
+ * http - trace http connections, requests and responses
* index - trace major index operations
* lock - trace when lockdir locks are taken or released
=== modified file 'bzrlib/help_topics.py'
--- a/bzrlib/help_topics.py 2007-11-06 07:01:07 +0000
+++ b/bzrlib/help_topics.py 2007-11-30 09:51:22 +0000
@@ -268,6 +268,7 @@
operations.
-Dhashcache Log every time a working file is read to determine its hash.
-Dhooks Trace hook execution.
+-Dhttp Trace http connections, requests and responses
-Dhpss Trace smart protocol requests and responses.
-Dindex Trace major index operations.
-Dlock Trace when lockdir locks are taken or released.
=== modified file 'bzrlib/tests/HTTPTestUtil.py'
--- a/bzrlib/tests/HTTPTestUtil.py 2007-10-17 15:36:20 +0000
+++ b/bzrlib/tests/HTTPTestUtil.py 2007-11-29 23:22:01 +0000
@@ -164,11 +164,6 @@
return TestingHTTPRequestHandler.get_multiple_ranges(self, file,
file_size, ranges)
- def do_GET(self):
- tcs = self.server.test_case_server
- tcs.GET_request_nb += 1
- return TestingHTTPRequestHandler.do_GET(self)
-
class LimitedRangeHTTPServer(HttpServer):
"""An HttpServer erroring out on requests with too much range specifiers"""
@@ -177,7 +172,6 @@
range_limit=None):
HttpServer.__init__(self, request_handler)
self.range_limit = range_limit
- self.GET_request_nb = 0
class SingleRangeRequestHandler(TestingHTTPRequestHandler):
@@ -208,8 +202,11 @@
class NoRangeRequestHandler(TestingHTTPRequestHandler):
"""Ignore range requests without notice"""
- # Just bypass the range handling done by TestingHTTPRequestHandler
- do_GET = SimpleHTTPRequestHandler.do_GET
+ def do_GET(self):
+ # Update the statistics
+ self.server.test_case_server.GET_request_nb += 1
+ # Just bypass the range handling done by TestingHTTPRequestHandler
+ return SimpleHTTPRequestHandler.do_GET(self)
class TestCaseWithWebserver(TestCaseWithTransport):
=== modified file 'bzrlib/tests/HttpServer.py'
--- a/bzrlib/tests/HttpServer.py 2007-10-31 12:42:42 +0000
+++ b/bzrlib/tests/HttpServer.py 2007-11-29 23:22:01 +0000
@@ -149,6 +149,8 @@
Handles the Range header.
"""
+ # Update statistics
+ self.server.test_case_server.GET_request_nb += 1
path = self.translate_path(self.path)
ranges_header_value = self.headers.get('Range')
@@ -304,6 +306,8 @@
self.host = 'localhost'
self.port = 0
self._httpd = None
+ # Allows tests to verify number of GET requests issued
+ self.GET_request_nb = 0
def _get_httpd(self):
if self._httpd is None:
=== modified file 'bzrlib/tests/test_http.py'
--- a/bzrlib/tests/test_http.py 2007-11-04 15:29:17 +0000
+++ b/bzrlib/tests/test_http.py 2007-11-30 16:48:24 +0000
@@ -32,15 +32,10 @@
config,
errors,
osutils,
+ tests,
ui,
urlutils,
)
-from bzrlib.tests import (
- TestCase,
- TestUIFactory,
- TestSkipped,
- StringIOWrapper,
- )
from bzrlib.tests.HttpServer import (
HttpServer,
HttpServer_PyCurl,
@@ -158,12 +153,12 @@
from bzrlib.transport.http._pycurl import PyCurlTransport
return PyCurlTransport
except errors.DependencyNotPresent:
- raise TestSkipped('pycurl not present')
+ raise tests.TestSkipped('pycurl not present')
_transport = property(_get_pycurl_maybe)
-class TestHttpUrls(TestCase):
+class TestHttpUrls(tests.TestCase):
# TODO: This should be moved to authorization tests once they
# are written.
@@ -226,7 +221,7 @@
server.tearDown()
-class TestHttpUrls_urllib(TestHttpTransportUrls, TestCase):
+class TestHttpUrls_urllib(TestHttpTransportUrls, tests.TestCase):
"""Test http urls with urllib"""
_transport = HttpTransport_urllib
@@ -235,7 +230,7 @@
class TestHttpUrls_pycurl(TestWithTransport_pycurl, TestHttpTransportUrls,
- TestCase):
+ tests.TestCase):
"""Test http urls with pycurl"""
_server = HttpServer_PyCurl
@@ -254,7 +249,7 @@
try:
import pycurl
except ImportError:
- raise TestSkipped('pycurl not present')
+ raise tests.TestSkipped('pycurl not present')
# Now that we have pycurl imported, we can fake its version_info
# This was taken from a windows pycurl without SSL
# (thanks to bialix)
@@ -352,7 +347,7 @@
"""Test http connections with pycurl"""
-class TestHttpTransportRegistration(TestCase):
+class TestHttpTransportRegistration(tests.TestCase):
"""Test registrations of various http implementations"""
def test_http_registered(self):
@@ -372,7 +367,7 @@
try:
http_transport = get_transport(url)
except errors.UnsupportedProtocol:
- raise TestSkipped('%s not available' % scheme)
+ raise tests.TestSkipped('%s not available' % scheme)
code, response = http_transport._post('abc def end-of-body')
self.assertTrue(
server.received_bytes.startswith('POST /.bzr/smart HTTP/1.'))
@@ -384,7 +379,7 @@
server.received_bytes.endswith('\r\n\r\nabc def end-of-body'))
-class TestPost_urllib(TestCase, TestPost):
+class TestPost_urllib(tests.TestCase, TestPost):
"""TestPost for urllib implementation"""
_transport = HttpTransport_urllib
@@ -393,14 +388,14 @@
self._test_post_body_is_received('http+urllib')
-class TestPost_pycurl(TestWithTransport_pycurl, TestCase, TestPost):
+class TestPost_pycurl(TestWithTransport_pycurl, tests.TestCase, TestPost):
"""TestPost for pycurl implementation"""
def test_post_body_is_received_pycurl(self):
self._test_post_body_is_received('http+pycurl')
-class TestRangeHeader(TestCase):
+class TestRangeHeader(tests.TestCase):
"""Test range_header method"""
def check_header(self, value, ranges=[], tail=0):
@@ -575,7 +570,7 @@
"""Tests forbidden server for pycurl implementation"""
-class TestRecordingServer(TestCase):
+class TestRecordingServer(tests.TestCase):
def test_create(self):
server = RecordingServer(expect_body_tail=None)
@@ -653,6 +648,20 @@
self.assertListRaises((errors.InvalidRange, errors.ShortReadvError,),
t.readv, 'a', [(12,2)])
+ def test_readv_multiple_get_requests(self):
+ server = self.get_readonly_server()
+ t = self._transport(server.get_url())
+ # force transport to issue multiple requests
+ t._max_readv_combine = 1
+ t._max_get_ranges = 1
+ l = list(t.readv('a', ((0, 1), (1, 1), (3, 2), (9, 1))))
+ self.assertEqual(l[0], (0, '0'))
+ self.assertEqual(l[1], (1, '1'))
+ self.assertEqual(l[2], (3, '34'))
+ self.assertEqual(l[3], (9, '9'))
+ # The server should have issued 4 requests
+ self.assertEqual(4, self.get_readonly_server().GET_request_nb)
+
class TestSingleRangeRequestServer(TestRangeRequestServer):
"""Test readv against a server which accept only single range requests"""
@@ -709,8 +718,8 @@
class TestNoRangeRequestServer_pycurl(TestWithTransport_pycurl,
- TestNoRangeRequestServer,
- TestCaseWithWebserver):
+ TestNoRangeRequestServer,
+ TestCaseWithWebserver):
"""Tests range requests refusing server for pycurl implementation"""
@@ -738,7 +747,7 @@
TestCaseWithWebserver.setUp(self)
# We need to manipulate ranges that correspond to real chunks in the
# response, so we build a content appropriately.
- filler = ''.join(['abcdefghij' for _ in range(102)])
+ filler = ''.join(['abcdefghij' for x in range(102)])
content = ''.join(['%04d' % v + filler for v in range(16)])
self.build_tree_contents([('a', content)],)
@@ -749,7 +758,7 @@
self.assertEqual(l[1], (1024, '0001'))
self.assertEqual(1, self.get_readonly_server().GET_request_nb)
- def test_a_lot_of_ranges(self):
+ def test_more_ranges(self):
t = self.get_transport()
l = list(t.readv('a', ((0, 4), (1024, 4), (4096, 4), (8192, 4))))
self.assertEqual(l[0], (0, '0000'))
@@ -775,14 +784,14 @@
-class TestHttpProxyWhiteBox(TestCase):
+class TestHttpProxyWhiteBox(tests.TestCase):
"""Whitebox test proxy http authorization.
Only the urllib implementation is tested here.
"""
def setUp(self):
- TestCase.setUp(self)
+ tests.TestCase.setUp(self)
self._old_env = {}
def tearDown(self):
@@ -933,12 +942,12 @@
# pycurl does not check HTTP_PROXY for security reasons
# (for use in a CGI context that we do not care
# about. Should we ?)
- raise TestSkipped('pycurl does not check HTTP_PROXY '
- 'for security reasons')
+ raise tests.TestNotApplicable(
+ 'pycurl does not check HTTP_PROXY for security reasons')
def test_HTTP_PROXY_with_NO_PROXY(self):
- raise TestSkipped('pycurl does not check HTTP_PROXY '
- 'for security reasons')
+ raise tests.TestNotApplicable(
+ 'pycurl does not check HTTP_PROXY for security reasons')
def test_http_proxy_without_scheme(self):
# pycurl *ignores* invalid proxy env variables. If that
@@ -1272,8 +1281,8 @@
def test_prompt_for_password(self):
self.server.add_user('joe', 'foo')
t = self.get_user_transport('joe', None)
- stdout = StringIOWrapper()
- ui.ui_factory = TestUIFactory(stdin='foo\n', stdout=stdout)
+ stdout = tests.StringIOWrapper()
+ ui.ui_factory = tests.TestUIFactory(stdin='foo\n', stdout=stdout)
self.assertEqual('contents of a\n',t.get('a').read())
# stdin should be empty
self.assertEqual('', ui.ui_factory.stdin.readline())
@@ -1302,8 +1311,8 @@
stdin_content = 'bar\n' # Not the right password
self.server.add_user(user, password)
t = self.get_user_transport(user, None)
- ui.ui_factory = TestUIFactory(stdin=stdin_content,
- stdout=StringIOWrapper())
+ ui.ui_factory = tests.TestUIFactory(stdin=stdin_content,
+ stdout=tests.StringIOWrapper())
# Create a minimal config file with the right password
conf = config.AuthenticationConfig()
conf._get_config().update(
=== modified file 'bzrlib/transport/http/__init__.py'
--- a/bzrlib/transport/http/__init__.py 2007-11-27 08:26:00 +0000
+++ b/bzrlib/transport/http/__init__.py 2007-11-30 09:51:22 +0000
@@ -208,9 +208,9 @@
self._range_hint = None
mutter('Retry "%s" without ranges' % relpath)
else:
- # We tried all the tricks, but nothing worked. We re-raise original
- # exception; the 'mutter' calls above will indicate that further
- # tries were unsuccessful
+ # We tried all the tricks, but nothing worked. We re-raise the
+ # original exception; the 'mutter' calls above will indicate that
+ # further tries were unsuccessful
raise exc_info[0], exc_info[1], exc_info[2]
def _get_ranges_hinted(self, relpath, ranges):
@@ -251,6 +251,9 @@
# No limit on the offset number that get combined into one, we are trying
# to avoid downloading the whole file.
_max_readv_combine = 0
+ # By default Apache has a limit of ~400 ranges before replying with a 400
+ # Bad Request. So we go underneath that amount to be safe.
+ _max_get_ranges = 200
def _readv(self, relpath, offsets):
"""Get parts of the file at the given relative path.
@@ -258,41 +261,84 @@
:param offsets: A list of (offset, size) tuples.
:param return: A list or generator of (offset, data) tuples
"""
- sorted_offsets = sorted(list(offsets))
- fudge = self._bytes_to_read_before_seek
- coalesced = self._coalesce_offsets(sorted_offsets,
- limit=self._max_readv_combine,
- fudge_factor=fudge)
- coalesced = list(coalesced)
- mutter('http readv of %s offsets => %s collapsed %s',
- relpath, len(offsets), len(coalesced))
-
- f = self._get_ranges_hinted(relpath, coalesced)
- for start, size in offsets:
- try_again = True
- while try_again:
- try_again = False
- f.seek(start, ((start < 0) and 2) or 0)
- start = f.tell()
- try:
- data = f.read(size)
- if len(data) != size:
- raise errors.ShortReadvError(relpath, start, size,
- actual=len(data))
- except errors.ShortReadvError, e:
- self._degrade_range_hint(relpath, coalesced, sys.exc_info())
-
- # Since the offsets and the ranges may not be in the same
- # order, we don't try to calculate a restricted single
- # range encompassing unprocessed offsets.
-
- # Note: we replace 'f' here, it may need cleaning one day
- # before being thrown that way.
- f = self._get_ranges_hinted(relpath, coalesced)
- try_again = True
-
- # After one or more tries, we get the data.
- yield start, data
+
+ # offsets may be a genarator, we will iterate it several times, so
+ # build a list
+ offsets = list(offsets)
+
+ try_again = True
+ while try_again:
+ try_again = False
+
+ # Coalesce the offsets to minimize the GET requests issued
+ sorted_offsets = sorted(offsets)
+ coalesced = self._coalesce_offsets(
+ sorted_offsets, limit=self._max_readv_combine,
+ fudge_factor=self._bytes_to_read_before_seek)
+
+ # Turn it into a list, we will iterate it several times
+ coalesced = list(coalesced)
+ mutter('http readv of %s offsets => %s collapsed %s',
+ relpath, len(offsets), len(coalesced))
+
+ # Cache the data read, but only until it's been used
+ data_map = {}
+ # We will iterate on the data received from the GET requests and
+ # serve the corresponding offsets repecting the initial order. We
+ # need an offset iterator for that.
+ iter_offsets = iter(offsets)
+ cur_offset_and_size = iter_offsets.next()
+
+ try:
+ for cur_coal, file in self._coalesce_readv(relpath, coalesced):
+ # Split the received chunk
+ for offset, size in cur_coal.ranges:
+ start = cur_coal.start + offset
+ file.seek(start, 0)
+ data = file.read(size)
+ data_len = len(data)
+ if data_len != size:
+ raise errors.ShortReadvError(relpath, start, size,
+ actual=data_len)
+ data_map[(start, size)] = data
+
+ # Yield everything we can
+ while cur_offset_and_size in data_map:
+ # Clean the cached data since we use it
+ # XXX: will break if offsets contains duplicates --
+ # vila20071129
+ this_data = data_map.pop(cur_offset_and_size)
+ yield cur_offset_and_size[0], this_data
+ cur_offset_and_size = iter_offsets.next()
+
+ except (errors.ShortReadvError,errors.InvalidRange), e:
+ self._degrade_range_hint(relpath, coalesced, sys.exc_info())
+ # Some offsets may have been already processed, so we retry
+ # only the unsuccessful ones.
+ offsets = [cur_offset_and_size] + [o for o in iter_offsets]
+ try_again = True
+
+ def _coalesce_readv(self, relpath, coalesced):
+ """Issue several GET requests to satisfy the coalesced offsets"""
+ total = len(coalesced)
+ if self._range_hint == 'multi':
+ max_ranges = self._max_get_ranges
+ elif self._range_hint == 'single':
+ max_ranges = total
+ else:
+ # The whole file will be downloaded anyway
+ max_ranges = total
+ # TODO: Some web servers may ignore the range requests and return the
+ # whole file, we may want to detect that and avoid further requests.
+ # Hint: test_readv_multiple_get_requests will fail in that case .
+ for group in xrange(0, len(coalesced), max_ranges):
+ ranges = coalesced[group:group+max_ranges]
+ # Note that the following may raise errors.InvalidRange. It's the
+ # caller responsability to decide how to retry since it may provide
+ # different coalesced offsets.
+ code, file = self._get(relpath, ranges)
+ for range in ranges:
+ yield range, file
def recommended_page_size(self):
"""See Transport.recommended_page_size().
@@ -447,7 +493,7 @@
"""Prepare a HTTP Range header at a level the server should accept"""
if self._range_hint == 'multi':
- # Nothing to do here
+ # Generate the header describing all offsets
return self._range_header(offsets, tail_amount)
elif self._range_hint == 'single':
# Combine all the requested ranges into a single
=== modified file 'bzrlib/transport/http/_pycurl.py'
--- a/bzrlib/transport/http/_pycurl.py 2007-10-31 12:38:11 +0000
+++ b/bzrlib/transport/http/_pycurl.py 2007-11-30 09:51:22 +0000
@@ -36,13 +36,12 @@
import sys
from bzrlib import (
+ debug,
errors,
+ trace,
__version__ as bzrlib_version,
)
import bzrlib
-from bzrlib.errors import (NoSuchFile,
- ConnectionError,
- DependencyNotPresent)
from bzrlib.trace import mutter
from bzrlib.transport.http import (
ca_bundle,
@@ -55,7 +54,7 @@
import pycurl
except ImportError, e:
mutter("failed to import pycurl: %s", e)
- raise DependencyNotPresent('pycurl', e)
+ raise errors.DependencyNotPresent('pycurl', e)
try:
# see if we can actually initialize PyCurl - sometimes it will load but
@@ -70,7 +69,7 @@
pycurl.Curl()
except pycurl.error, e:
mutter("failed to initialize pycurl: %s", e)
- raise DependencyNotPresent('pycurl', e)
+ raise errors.DependencyNotPresent('pycurl', e)
@@ -112,7 +111,7 @@
# protocols
supported = pycurl.version_info()[8]
if 'https' not in supported:
- raise DependencyNotPresent('pycurl', 'no https support')
+ raise errors.DependencyNotPresent('pycurl', 'no https support')
self.cabundle = ca_bundle.get_ca_path()
def _get_curl(self):
@@ -202,7 +201,7 @@
data.seek(0)
if code == 404:
- raise NoSuchFile(abspath)
+ raise errors.NoSuchFile(abspath)
if code != 200:
self._raise_curl_http_error(
curl, 'expected 200 or 404 for full response.')
@@ -264,9 +263,12 @@
def _set_curl_options(self, curl):
"""Set options for all requests"""
- ## curl.setopt(pycurl.VERBOSE, 1)
- # TODO: maybe include a summary of the pycurl version
- ua_str = 'bzr/%s (pycurl)' % (bzrlib.__version__,)
+ if 'http' in debug.debug_flags:
+ curl.setopt(pycurl.VERBOSE, 1)
+ # pycurl doesn't implement the CURLOPT_STDERR option, so we can't
+ # do : curl.setopt(pycurl.STDERR, trace._trace_file)
+
+ ua_str = 'bzr/%s (pycurl: %s)' % (bzrlib.__version__, pycurl.version)
curl.setopt(pycurl.USERAGENT, ua_str)
if self.cabundle:
curl.setopt(pycurl.CAINFO, self.cabundle)
@@ -292,8 +294,8 @@
CURLE_COULDNT_CONNECT,
CURLE_GOT_NOTHING,
CURLE_COULDNT_RESOLVE_PROXY,):
- raise ConnectionError('curl connection error (%s)\non %s'
- % (e[1], url))
+ raise errors.ConnectionError(
+ 'curl connection error (%s)\non %s' % (e[1], url))
elif e[0] == CURLE_PARTIAL_FILE:
# Pycurl itself has detected a short read. We do
# not have all the information for the
=== modified file 'bzrlib/transport/http/_urllib.py'
--- a/bzrlib/transport/http/_urllib.py 2007-11-04 15:44:32 +0000
+++ b/bzrlib/transport/http/_urllib.py 2007-11-30 09:51:22 +0000
@@ -20,10 +20,10 @@
from bzrlib import (
errors,
+ trace,
urlutils,
)
-from bzrlib.trace import mutter
-from bzrlib.transport.http import HttpTransportBase
+from bzrlib.transport import http
# TODO: handle_response should be integrated into the _urllib2_wrappers
from bzrlib.transport.http.response import handle_response
from bzrlib.transport.http._urllib2_wrappers import (
@@ -32,7 +32,7 @@
)
-class HttpTransport_urllib(HttpTransportBase):
+class HttpTransport_urllib(http.HttpTransportBase):
"""Python urllib transport for http and https."""
# In order to debug we have to issue our traces in sync with
@@ -85,7 +85,6 @@
request.auth = auth
request.proxy_auth = proxy_auth
- mutter('%s: [%s]' % (request.method, request.get_full_url()))
if self._debuglevel > 0:
print 'perform: %s base: %s, url: %s' % (request.method, self.base,
request.get_full_url())
@@ -108,8 +107,8 @@
qual_proto=self._scheme)
if request.redirected_to is not None:
- mutter('redirected from: %s to: %s' % (request.get_full_url(),
- request.redirected_to))
+ trace.mutter('redirected from: %s to: %s' % (request.get_full_url(),
+ request.redirected_to))
return response
=== modified file 'bzrlib/transport/http/_urllib2_wrappers.py'
--- a/bzrlib/transport/http/_urllib2_wrappers.py 2007-11-05 13:21:30 +0000
+++ b/bzrlib/transport/http/_urllib2_wrappers.py 2007-11-30 09:51:22 +0000
@@ -32,6 +32,9 @@
handle authentication schemes.
"""
+# TODO: now that we have -Dhttp most of the needs should be covered in a more
+# accessible way (i.e. no need to edit the source), if experience confirms
+# that, delete all DEBUG uses -- vila20071130 (happy birthday).
DEBUG = 0
# FIXME: Oversimplifying, two kind of exceptions should be
@@ -56,7 +59,9 @@
from bzrlib import __version__ as bzrlib_version
from bzrlib import (
config,
+ debug,
errors,
+ trace,
transport,
ui,
)
@@ -101,7 +106,8 @@
# having issued the response headers (even if the
# headers indicate a Content-Type...)
body = self.fp.read(self.length)
- if self.debuglevel > 0:
+ if self.debuglevel > 3:
+ # This one can be huge and is generally not interesting
print "Consumed body: [%s]" % body
self.close()
elif self.status == 200:
@@ -143,6 +149,13 @@
# XXX: Needs refactoring at the caller level.
def __init__(self, host, port=None, strict=None, proxied_host=None):
+ if 'http' in debug.debug_flags:
+ netloc = host
+ if port is not None:
+ netloc += '%d' % port
+ if proxied_host is not None:
+ netloc += '(proxy for %s)' % proxied_host
+ trace.mutter('* About to connect() to %s' % netloc)
httplib.HTTPConnection.__init__(self, host, port, strict)
self.proxied_host = proxied_host
@@ -425,12 +438,17 @@
headers.update(request.unredirected_hdrs)
try:
- connection._send_request(request.get_method(),
- request.get_selector(),
+ method = request.get_method()
+ url = request.get_selector()
+ connection._send_request(method, url,
# FIXME: implements 100-continue
#None, # We don't send the body yet
request.get_data(),
headers)
+ if 'http' in debug.debug_flags:
+ trace.mutter('> %s %s' % (method, url))
+ hdrs = ['%s: %s' % (k, v) for k,v in headers.items()]
+ trace.mutter('> ' + '\n> '.join(hdrs) + '\n')
if self._debuglevel > 0:
print 'Request sent: [%r]' % request
response = connection.getresponse()
@@ -453,6 +471,17 @@
# connection.send(body)
# response = connection.getresponse()
+ if 'http' in debug.debug_flags:
+ version = 'HTTP/%d.%d'
+ try:
+ version = version % (response.version / 10,
+ response.version % 10)
+ except:
+ version = 'HTTP/%r' % version
+ trace.mutter('< %s %s %s' % (version, response.status,
+ response.reason))
+ hdrs = [h.rstrip('\r\n') for h in response.msg.headers]
+ trace.mutter('< ' + '\n< '.join(hdrs) + '\n')
if self._debuglevel > 0:
print 'Receives response: %r' % response
print ' For: %r(%r)' % (request.get_method(),
@@ -1285,7 +1314,7 @@
)
self.open = self._opener.open
- if DEBUG >= 2:
+ if DEBUG >= 3:
# When dealing with handler order, it's easy to mess
# things up, the following will help understand which
# handler is used, when and for what.
=== modified file 'bzrlib/transport/http/response.py'
--- a/bzrlib/transport/http/response.py 2007-06-20 13:56:21 +0000
+++ b/bzrlib/transport/http/response.py 2007-11-29 23:22:01 +0000
@@ -287,9 +287,6 @@
# magic value.
raise errors.InvalidRange(url,0)
- # TODO: jam 20060713 Properly handle redirects (302 Found, etc)
- # The '_get' code says to follow redirects, we probably
- # should actually handle the return values
else:
raise errors.InvalidHttpResponse(url, "Unknown response code %s"
% (code,))
More information about the bazaar-commits
mailing list