Rev 4200: (andrew) Buffer writes when pushing to a pack repository on a in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Wed Mar 25 02:03:47 GMT 2009


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 4200
revision-id: pqm at pqm.ubuntu.com-20090325020341-dmq0yek061gtungf
parent: pqm at pqm.ubuntu.com-20090324231912-rb0kgktzkvge8aea
parent: andrew.bennetts at canonical.com-20090324222046-mhx6gqyu7qm4ngkt
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2009-03-25 02:03:41 +0000
message:
  (andrew) Buffer writes when pushing to a pack repository on a
  	pre-1.12 smart server.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
  bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
  bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
    ------------------------------------------------------------
    revno: 4187.3.6
    revision-id: andrew.bennetts at canonical.com-20090324222046-mhx6gqyu7qm4ngkt
    parent: andrew.bennetts at canonical.com-20090324024839-oiwvj7rkob17on3x
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: remote-pack-hack
    timestamp: Wed 2009-03-25 09:20:46 +1100
    message:
      Move the flush in KnitVersionedFiles.insert_record_stream so that it covers the add_lines call of the fallback case, not just the adapter.get_bytes.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
    ------------------------------------------------------------
    revno: 4187.3.5
    revision-id: andrew.bennetts at canonical.com-20090324024839-oiwvj7rkob17on3x
    parent: andrew.bennetts at canonical.com-20090324024609-fo3q9opym5srqudy
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: remote-pack-hack
    timestamp: Tue 2009-03-24 13:48:39 +1100
    message:
      Add NEWS entry.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
    ------------------------------------------------------------
    revno: 4187.3.4
    revision-id: andrew.bennetts at canonical.com-20090324024609-fo3q9opym5srqudy
    parent: andrew.bennetts at canonical.com-20090324023447-5m39wlirz4kzj16a
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: remote-pack-hack
    timestamp: Tue 2009-03-24 13:46:09 +1100
    message:
      Better docstrings and comments.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
    ------------------------------------------------------------
    revno: 4187.3.3
    revision-id: andrew.bennetts at canonical.com-20090324023447-5m39wlirz4kzj16a
    parent: andrew.bennetts at canonical.com-20090324020946-h3vkfs75tq0ghqul
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: remote-pack-hack
    timestamp: Tue 2009-03-24 13:34:47 +1100
    message:
      In KnitVersionedFiles.insert_record_stream, flush the access object before expanding a delta into a fulltext.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 4187.3.2
    revision-id: andrew.bennetts at canonical.com-20090324020946-h3vkfs75tq0ghqul
    parent: andrew.bennetts at canonical.com-20090324020232-ndz0vognhdqsbj2o
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: remote-pack-hack
    timestamp: Tue 2009-03-24 13:09:46 +1100
    message:
      Only enable the hack when the serializers match, otherwise we cause ShortReadvErrors.
    modified:
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
    ------------------------------------------------------------
    revno: 4187.3.1
    revision-id: andrew.bennetts at canonical.com-20090324020232-ndz0vognhdqsbj2o
    parent: pqm at pqm.ubuntu.com-20090323202515-uwlqu9w037ndukz4
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: remote-pack-hack
    timestamp: Tue 2009-03-24 13:02:32 +1100
    message:
      Add set_write_cache_size hack in StreamSink to avoid too many round trips with old HPSS servers.
    modified:
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
=== modified file 'NEWS'
--- a/NEWS	2009-03-24 23:19:12 +0000
+++ b/NEWS	2009-03-25 02:03:41 +0000
@@ -67,6 +67,10 @@
 * Progress bars now show the rate of network activity for
   ``bzr+ssh://`` and ``bzr://`` connections.  (Andrew Bennetts)
 
+* Pushing to a stacked pack-format branch on a 1.12 or older smart server
+  now takes many less round trips.  (Andrew Bennetts, Robert Collins,
+  #294479)
+  
 * Streaming push can be done to older repository formats.  This is
   implemented using a new ``Repository.insert_stream_locked`` RPC.
   (Andrew Bennetts, Robert Collins)

=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py	2009-03-23 14:59:43 +0000
+++ b/bzrlib/knit.py	2009-03-25 02:03:41 +0000
@@ -1609,6 +1609,7 @@
                 # KnitVersionedFiles doesn't permit deltas (_max_delta_chain ==
                 # 0) or because it depends on a base only present in the
                 # fallback kvfs.
+                self._access.flush()
                 try:
                     # Try getting a fulltext directly from the record.
                     bytes = record.get_bytes_as('fulltext')
@@ -3049,6 +3050,13 @@
             result.append((key, base, size))
         return result
 
+    def flush(self):
+        """Flush pending writes on this access object.
+        
+        For .knit files this is a no-op.
+        """
+        pass
+
     def get_raw_records(self, memos_for_retrieval):
         """Get the raw bytes for a records.
 
@@ -3079,7 +3087,7 @@
 class _DirectPackAccess(object):
     """Access to data in one or more packs with less translation."""
 
-    def __init__(self, index_to_packs, reload_func=None):
+    def __init__(self, index_to_packs, reload_func=None, flush_func=None):
         """Create a _DirectPackAccess object.
 
         :param index_to_packs: A dict mapping index objects to the transport
@@ -3092,6 +3100,7 @@
         self._write_index = None
         self._indices = index_to_packs
         self._reload_func = reload_func
+        self._flush_func = flush_func
 
     def add_raw_records(self, key_sizes, raw_data):
         """Add raw knit bytes to a storage area.
@@ -3119,6 +3128,14 @@
             result.append((self._write_index, p_offset, p_length))
         return result
 
+    def flush(self):
+        """Flush pending writes on this access object.
+
+        This will flush any buffered writes to a NewPack.
+        """
+        if self._flush_func is not None:
+            self._flush_func()
+            
     def get_raw_records(self, memos_for_retrieval):
         """Get the raw bytes for a records.
 

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- a/bzrlib/repofmt/pack_repo.py	2009-03-24 01:53:42 +0000
+++ b/bzrlib/repofmt/pack_repo.py	2009-03-25 02:03:41 +0000
@@ -532,7 +532,7 @@
     # XXX: Probably 'can be written to' could/should be separated from 'acts
     # like a knit index' -- mbp 20071024
 
-    def __init__(self, reload_func=None):
+    def __init__(self, reload_func=None, flush_func=None):
         """Create an AggregateIndex.
 
         :param reload_func: A function to call if we find we are missing an
@@ -543,7 +543,8 @@
         self.index_to_pack = {}
         self.combined_index = CombinedGraphIndex([], reload_func=reload_func)
         self.data_access = _DirectPackAccess(self.index_to_pack,
-                                             reload_func=reload_func)
+                                             reload_func=reload_func,
+                                             flush_func=flush_func)
         self.add_callback = None
 
     def replace_indices(self, index_to_pack, indices):
@@ -1322,10 +1323,11 @@
         # when a pack is being created by this object, the state of that pack.
         self._new_pack = None
         # aggregated revision index data
-        self.revision_index = AggregateIndex(self.reload_pack_names)
-        self.inventory_index = AggregateIndex(self.reload_pack_names)
-        self.text_index = AggregateIndex(self.reload_pack_names)
-        self.signature_index = AggregateIndex(self.reload_pack_names)
+        flush = self._flush_new_pack
+        self.revision_index = AggregateIndex(self.reload_pack_names, flush)
+        self.inventory_index = AggregateIndex(self.reload_pack_names, flush)
+        self.text_index = AggregateIndex(self.reload_pack_names, flush)
+        self.signature_index = AggregateIndex(self.reload_pack_names, flush)
         # resumed packs
         self._resumed_packs = []
 
@@ -1452,6 +1454,10 @@
         for revision_count, packs in pack_operations:
             self._obsolete_packs(packs)
 
+    def _flush_new_pack(self):
+        if self._new_pack is not None:
+            self._new_pack.flush()
+
     def lock_names(self):
         """Acquire the mutex around the pack-names index.
 

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2009-03-24 01:53:42 +0000
+++ b/bzrlib/repository.py	2009-03-25 02:03:41 +0000
@@ -3750,6 +3750,24 @@
     def _locked_insert_stream(self, stream, src_format):
         to_serializer = self.target_repo._format._serializer
         src_serializer = src_format._serializer
+        if to_serializer == src_serializer:
+            # If serializers match and the target is a pack repository, set the
+            # write cache size on the new pack.  This avoids poor performance
+            # on transports where append is unbuffered (such as
+            # RemoteTransport).  This is safe to do because nothing should read
+            # back from the target repository while a stream with matching
+            # serialization is being inserted.
+            # The exception is that a delta record from the source that should
+            # be a fulltext may need to be expanded by the target (see
+            # test_fetch_revisions_with_deltas_into_pack); but we take care to
+            # explicitly flush any buffered writes first in that rare case.
+            try:
+                new_pack = self.target_repo._pack_collection._new_pack
+            except AttributeError:
+                # Not a pack repository
+                pass
+            else:
+                new_pack.set_write_cache_size(1024*1024)
         for substream_type, substream in stream:
             if substream_type == 'texts':
                 self.target_repo.texts.insert_record_stream(substream)




More information about the bazaar-commits mailing list