Rev 2792: (robertc) Reduce overhead in pack generation by 25 percent. (Robert Collins) in file:///home/pqm/archives/thelove/bzr/%2Btrunk/ Patch Queue Manager pqm at
Tue Sep 4 01:46:21 BST 2007

At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

revno: 2792
revision-id: pqm at
parent: pqm at
parent: robertc at
committer: Patch Queue Manager <pqm at>
branch nick: +trunk
timestamp: Tue 2007-09-04 01:46:17 +0100
  (robertc) Reduce overhead in pack generation by 25 percent. (Robert Collins)
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
    revno: 2776.2.2
    merged: robertc at
    parent: robertc at
    parent: pqm at
    committer: Robert Collins <robertc at>
    branch nick: integration
    timestamp: Tue 2007-09-04 09:19:00 +1000
      Fix some inconsistent NEWS indents.
    revno: 2776.2.1
    merged: robertc at
    parent: pqm at
    committer: Robert Collins <robertc at>
    branch nick: pack
    timestamp: Mon 2007-09-03 14:31:34 +1000
      25 percent time reduction in pack write logic.
=== added file 'bzrlib/benchmarks/'
--- a/bzrlib/benchmarks/	1970-01-01 00:00:00 +0000
+++ b/bzrlib/benchmarks/	2007-09-03 04:31:34 +0000
@@ -0,0 +1,54 @@
+# Copyright (C) 2007 Canonical Ltd
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# GNU General Public License for more details.
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+"""Benchmarks for pack performance"""
+import os
+from bzrlib import (
+    pack,
+    )
+from bzrlib.benchmarks import Benchmark
+class BenchPack(Benchmark):
+    """Benchmark pack performance."""
+    def test_insert_one_gig_1k_chunks_no_names_disk(self):
+        # test real disk writing of many small chunks. 
+        # useful for testing whether buffer sizes are right 
+        transport = self.get_transport()
+        stream = transport.open_write_stream('pack.pack')
+        writer = pack.ContainerWriter(stream.write)
+        self.write_1_gig(writer)
+        stream.close()
+    def test_insert_one_gig_1k_chunks_no_names_null(self):
+        # write to dev/null so we test the pack processing.
+        transport = self.get_transport()
+        dev_null = open('/dev/null', 'wb')
+        writer = pack.ContainerWriter(dev_null.write)
+        self.write_1_gig(writer)
+        dev_null.close()
+    def write_1_gig(self, writer):
+        one_k = "A" * 1024
+        writer.begin()
+        def write_1g():
+            for hunk in xrange(1024 * 1024):
+                writer.add_bytes_record(one_k, [])
+        self.time(write_1g)
+        writer.end()

=== modified file 'NEWS'
--- a/NEWS	2007-09-03 21:19:07 +0000
+++ b/NEWS	2007-09-03 23:19:00 +0000
@@ -106,18 +106,16 @@
    * Avoid trouble when Windows ssh calls itself 'plink' but no plink
      binary is present.  (Martin Albisetti, #107155)
-    * ``bzr remove`` should remove clean subtrees.
-      Now it will remove (without needing ``--force``) subtrees that contain no
-      files with text changes or modified files.
-      With ``--force`` it removes the subtree regardless of text changes or
-      unknown files.
-      Directories with renames in or out (but not changed otherwise)
-      will now be removed without needing ``--force``.
-      Unknown ignored files will be deleted without needing ``--force``.
-      (Marius Kruger, #111665)
+   * ``bzr remove`` should remove clean subtrees.  Now it will remove (without
+     needing ``--force``) subtrees that contain no files with text changes or
+     modified files.  With ``--force`` it removes the subtree regardless of
+     text changes or unknown files. Directories with renames in or out (but
+     not changed otherwise) will now be removed without needing ``--force``.
+     Unknown ignored files will be deleted without needing ``--force``.
+     (Marius Kruger, #111665)
-    * When two plugins conflict, the source of both the losing and now the
-      winning definition is shown.  (Konstantin Mikhaylov, #5454)
+   * When two plugins conflict, the source of both the losing and now the
+     winning definition is shown.  (Konstantin Mikhaylov, #5454)

=== modified file 'bzrlib/benchmarks/'
--- a/bzrlib/benchmarks/	2007-08-29 04:43:31 +0000
+++ b/bzrlib/benchmarks/	2007-09-03 04:31:34 +0000
@@ -185,6 +185,7 @@
+                   'bzrlib.benchmarks.bench_pack',

=== modified file 'bzrlib/'
--- a/bzrlib/	2007-08-15 01:12:57 +0000
+++ b/bzrlib/	2007-09-03 04:31:34 +0000
@@ -104,20 +104,29 @@
         current_offset = self.current_offset
         # Kind marker
-        self.write_func("B")
+        byte_sections = ["B"]
         # Length
-        self.write_func(str(len(bytes)) + "\n")
+        byte_sections.append(str(len(bytes)) + "\n")
         # Names
         for name_tuple in names:
             # Make sure we're writing valid names.  Note that we will leave a
             # half-written record if a name is bad!
             for name in name_tuple:
-            self.write_func('\x00'.join(name_tuple) + "\n")
+            byte_sections.append('\x00'.join(name_tuple) + "\n")
         # End of headers
-        self.write_func("\n")
+        byte_sections.append("\n")
         # Finally, the contents.
-        self.write_func(bytes)
+        byte_sections.append(bytes)
+        # XXX: This causes a memory copy of bytes in size, but is usually
+        # faster than two write calls (12 vs 13 seconds to output a gig of
+        # 1k records.) - results may differ on significantly larger records
+        # like .iso's but as they should be rare in any case and thus not
+        # likely to be the common case. The biggest issue is causing extreme
+        # memory pressure in that case. One possibly improvement here is to
+        # check the size of the content before deciding to join here vs call
+        # write twice.
+        self.write_func(''.join(byte_sections))
         self.records_written += 1
         # return a memo of where we wrote data to allow random access.
         return current_offset, self.current_offset - current_offset

More information about the bazaar-commits mailing list