Rev 3291: (andrew) Greatly reduce the number of recv calls made when using in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Mon Mar 17 21:48:27 GMT 2008


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3291
revision-id:pqm at pqm.ubuntu.com-20080317214818-thbkv3yfh00rj2g2
parent: pqm at pqm.ubuntu.com-20080317054003-mzukdvwi1d2icd4c
parent: andrew.bennetts at canonical.com-20080317195319-jbcy9qhifps49smo
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Mon 2008-03-17 21:48:18 +0000
message:
  (andrew) Greatly reduce the number of recv calls made when using
  	urllib by forcing it to use a buffered file object.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3287.3.3
    revision-id:andrew.bennetts at canonical.com-20080317195319-jbcy9qhifps49smo
    parent: andrew.bennetts at canonical.com-20080317171649-n5kszez1vltj8rr1
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: urllib-recv-1-hack
    timestamp: Mon 2008-03-17 14:53:19 -0500
    message:
      A slightly neater hack for forcing buffering, thanks to John.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3287.3.2
    revision-id:andrew.bennetts at canonical.com-20080317171649-n5kszez1vltj8rr1
    parent: andrew.bennetts at canonical.com-20080316195309-jw6xcslc0goezghu
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: urllib-recv-1-hack
    timestamp: Mon 2008-03-17 12:16:49 -0500
    message:
      Buffer 64k, rather than just 8k.
    modified:
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
    ------------------------------------------------------------
    revno: 3287.3.1
    revision-id:andrew.bennetts at canonical.com-20080316195309-jw6xcslc0goezghu
    parent: pqm at pqm.ubuntu.com-20080316165803-tisoc9mpob9z544o
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: urllib-recv-1-hack
    timestamp: Sun 2008-03-16 14:53:09 -0500
    message:
      A hack to make urllib not call recv(1) lots and lots.
    modified:
      bzrlib/transport/http/_urllib2_wrappers.py _urllib2_wrappers.py-20060913231729-ha9ugi48ktx481ao-1
=== modified file 'NEWS'
--- a/NEWS	2008-03-17 03:21:18 +0000
+++ b/NEWS	2008-03-17 21:48:18 +0000
@@ -16,6 +16,10 @@
 
   IMPROVEMENTS:
 
+    * Fetching data over HTTP is a bit faster when urllib is used.  This is done
+      by forcing it to recv 64k at a time when reading lines in HTTP headers,
+      rather than just 1 byte at a time.  (Andrew Bennetts)
+
   BUGFIXES:
 
   DOCUMENTATION:

=== modified file 'bzrlib/transport/http/_urllib2_wrappers.py'
--- a/bzrlib/transport/http/_urllib2_wrappers.py	2008-01-03 16:26:32 +0000
+++ b/bzrlib/transport/http/_urllib2_wrappers.py	2008-03-17 19:53:19 +0000
@@ -64,6 +64,18 @@
     )
 
 
+class _BufferedMakefileSocket(object):
+
+    def __init__(self, sock):
+        self.sock = sock
+
+    def makefile(self, mode='r', bufsize=-1):
+        return self.sock.makefile(mode, 65536)
+
+    def __getattr__(self, name):
+        return getattr(self.sock, name)
+
+
 # We define our own Response class to keep our httplib pipe clean
 class Response(httplib.HTTPResponse):
     """Custom HTTPResponse, to avoid the need to decorate.
@@ -81,6 +93,14 @@
     # 8k chunks should be fine.
     _discarded_buf_size = 8192
 
+    def __init__(self, sock, *args, **kwargs):
+        # httplib creates a fileobject that doesn't do buffering, which
+        # makes fp.readline() very expensive because it only reads one byte
+        # at a time.  So we wrap the socket in an object that forces
+        # sock.makefile to make a buffered file.
+        sock = _BufferedMakefileSocket(sock)
+        httplib.HTTPResponse.__init__(self, sock, *args, **kwargs)
+
     def begin(self):
         """Begin to read the response from the server.
 
@@ -534,7 +554,7 @@
             req = request
             r = response
             r.recv = r.read
-            fp = socket._fileobject(r)
+            fp = socket._fileobject(r, bufsize=65536)
             resp = urllib2.addinfourl(fp, r.msg, req.get_full_url())
             resp.code = r.status
             resp.msg = r.reason




More information about the bazaar-commits mailing list