Rev 3652: Don't downsize the data buffer while reading. in http://bzr.arbash-meinel.com/branches/bzr/1.7-dev/hpss_readv

Thu Aug 28 22:27:48 BST 2008

At http://bzr.arbash-meinel.com/branches/bzr/1.7-dev/hpss_readv

------------------------------------------------------------
revno: 3652
revision-id: john at arbash-meinel.com-20080828212748-fplqyspastui6wq8
parent: john at arbash-meinel.com-20080828210435-h30020sylefc8750
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: hpss_readv
timestamp: Thu 2008-08-28 16:27:48 -0500
message:
  Don't downsize the data buffer while reading.
  Instead just use offsets into the data buffer.
  It turns out that the string copying was massively dominating performance.
  With this and the earlier patch, it drops the time for
  bzr branch bzr+ssh://localhost/bzr.dev
  from 1m26s down to 39.3s.
  And 32s is the local transport time.
-------------- next part --------------
=== modified file 'bzrlib/transport/remote.py'

--- a/bzrlib/transport/remote.py	2008-08-28 20:47:56 +0000
+++ b/bzrlib/transport/remote.py	2008-08-28 21:27:48 +0000
@@ -341,14 +341,28 @@
         data = response_handler.read_body_bytes()
         # Cache the results, but only until they have been fulfilled
         data_map = {}
+        data_offset = 0
         for c_offset in coalesced:
             if len(data) < c_offset.length:
                 raise errors.ShortReadvError(relpath, c_offset.start,
                             c_offset.length, actual=len(data))
             for suboffset, subsize in c_offset.ranges:
                 key = (c_offset.start+suboffset, subsize)
-                data_map[key] = data[suboffset:suboffset+subsize]
-            data = data[c_offset.length:]
+                this_data = data[data_offset+suboffset:
+                                 data_offset+suboffset+subsize]
+                # Special case when the data is in-order, rather than packing
+                # into a map and then back out again. Benchmarking shows that
+                # this has 100% hit rate, but leave in the data_map work just
+                # in case.
+                # TODO: Could we get away with using buffer() to avoid the
+                #       memory copy?  Callers would need to realize they may
+                #       not have a real string.
+                if key == cur_offset_and_size:
+                    yield cur_offset_and_size[0], this_data
+                    cur_offset_and_size = offset_stack.next()
+                else:
+                    data_map[key] = this_data
+            data_offset += c_offset.length
 
             # Now that we've read some data, see if we can yield anything back
             while cur_offset_and_size in data_map: