Rev 2792: (robertc) Reduce overhead in pack generation by 25 percent. (Robert Collins) in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue Sep 4 01:46:21 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2792
revision-id: pqm at pqm.ubuntu.com-20070904004617-gu4xyzuw6mgesvt7
parent: pqm at pqm.ubuntu.com-20070903211907-igj2uj83hz1yyqs9
parent: robertc at robertcollins.net-20070903231900-j3or8vkiixxpskzm
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2007-09-04 01:46:17 +0100
message:
  (robertc) Reduce overhead in pack generation by 25 percent. (Robert Collins)
added:
  bzrlib/benchmarks/bench_pack.py bench_pack.py-20070903042947-0wphp878xr6wkw7t-1
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/benchmarks/__init__.py  __init__.py-20060516064526-eb0d37c78e86065d
  bzrlib/pack.py                 container.py-20070607160755-tr8zc26q18rn0jnb-1
    ------------------------------------------------------------
    revno: 2776.2.2
    merged: robertc at robertcollins.net-20070903231900-j3or8vkiixxpskzm
    parent: robertc at robertcollins.net-20070903043134-k1w3zs0se7psbuoh
    parent: pqm at pqm.ubuntu.com-20070903211907-igj2uj83hz1yyqs9
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: integration
    timestamp: Tue 2007-09-04 09:19:00 +1000
    message:
      Fix some inconsistent NEWS indents.
    ------------------------------------------------------------
    revno: 2776.2.1
    merged: robertc at robertcollins.net-20070903043134-k1w3zs0se7psbuoh
    parent: pqm at pqm.ubuntu.com-20070901160444-hcr66zejwyy0jezc
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: pack
    timestamp: Mon 2007-09-03 14:31:34 +1000
    message:
      25 percent time reduction in pack write logic.
=== added file 'bzrlib/benchmarks/bench_pack.py'
--- a/bzrlib/benchmarks/bench_pack.py	1970-01-01 00:00:00 +0000
+++ b/bzrlib/benchmarks/bench_pack.py	2007-09-03 04:31:34 +0000
@@ -0,0 +1,54 @@
+# Copyright (C) 2007 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+"""Benchmarks for pack performance"""
+
+import os
+
+from bzrlib import (
+    pack,
+    )
+from bzrlib.benchmarks import Benchmark
+
+
+class BenchPack(Benchmark):
+    """Benchmark pack performance."""
+
+    def test_insert_one_gig_1k_chunks_no_names_disk(self):
+        # test real disk writing of many small chunks. 
+        # useful for testing whether buffer sizes are right 
+        transport = self.get_transport()
+        stream = transport.open_write_stream('pack.pack')
+        writer = pack.ContainerWriter(stream.write)
+        self.write_1_gig(writer)
+        stream.close()
+
+    def test_insert_one_gig_1k_chunks_no_names_null(self):
+        # write to dev/null so we test the pack processing.
+        transport = self.get_transport()
+        dev_null = open('/dev/null', 'wb')
+        writer = pack.ContainerWriter(dev_null.write)
+        self.write_1_gig(writer)
+        dev_null.close()
+
+    def write_1_gig(self, writer):
+        one_k = "A" * 1024
+        writer.begin()
+        def write_1g():
+            for hunk in xrange(1024 * 1024):
+                writer.add_bytes_record(one_k, [])
+        self.time(write_1g)
+        writer.end()

=== modified file 'NEWS'
--- a/NEWS	2007-09-03 21:19:07 +0000
+++ b/NEWS	2007-09-03 23:19:00 +0000
@@ -106,18 +106,16 @@
    * Avoid trouble when Windows ssh calls itself 'plink' but no plink
      binary is present.  (Martin Albisetti, #107155)
 
-    * ``bzr remove`` should remove clean subtrees.
-      Now it will remove (without needing ``--force``) subtrees that contain no
-      files with text changes or modified files.
-      With ``--force`` it removes the subtree regardless of text changes or
-      unknown files.
-      Directories with renames in or out (but not changed otherwise)
-      will now be removed without needing ``--force``.
-      Unknown ignored files will be deleted without needing ``--force``.
-      (Marius Kruger, #111665)
+   * ``bzr remove`` should remove clean subtrees.  Now it will remove (without
+     needing ``--force``) subtrees that contain no files with text changes or
+     modified files.  With ``--force`` it removes the subtree regardless of
+     text changes or unknown files. Directories with renames in or out (but
+     not changed otherwise) will now be removed without needing ``--force``.
+     Unknown ignored files will be deleted without needing ``--force``.
+     (Marius Kruger, #111665)
 
-    * When two plugins conflict, the source of both the losing and now the
-      winning definition is shown.  (Konstantin Mikhaylov, #5454)
+   * When two plugins conflict, the source of both the losing and now the
+     winning definition is shown.  (Konstantin Mikhaylov, #5454)
 
   IMPROVEMENTS:
 

=== modified file 'bzrlib/benchmarks/__init__.py'
--- a/bzrlib/benchmarks/__init__.py	2007-08-29 04:43:31 +0000
+++ b/bzrlib/benchmarks/__init__.py	2007-09-03 04:31:34 +0000
@@ -185,6 +185,7 @@
                    'bzrlib.benchmarks.bench_inventory',
                    'bzrlib.benchmarks.bench_knit',
                    'bzrlib.benchmarks.bench_log',
+                   'bzrlib.benchmarks.bench_pack',
                    'bzrlib.benchmarks.bench_osutils',
                    'bzrlib.benchmarks.bench_rocks',
                    'bzrlib.benchmarks.bench_startup',

=== modified file 'bzrlib/pack.py'
--- a/bzrlib/pack.py	2007-08-15 01:12:57 +0000
+++ b/bzrlib/pack.py	2007-09-03 04:31:34 +0000
@@ -104,20 +104,29 @@
         """
         current_offset = self.current_offset
         # Kind marker
-        self.write_func("B")
+        byte_sections = ["B"]
         # Length
-        self.write_func(str(len(bytes)) + "\n")
+        byte_sections.append(str(len(bytes)) + "\n")
         # Names
         for name_tuple in names:
             # Make sure we're writing valid names.  Note that we will leave a
             # half-written record if a name is bad!
             for name in name_tuple:
                 _check_name(name)
-            self.write_func('\x00'.join(name_tuple) + "\n")
+            byte_sections.append('\x00'.join(name_tuple) + "\n")
         # End of headers
-        self.write_func("\n")
+        byte_sections.append("\n")
         # Finally, the contents.
-        self.write_func(bytes)
+        byte_sections.append(bytes)
+        # XXX: This causes a memory copy of bytes in size, but is usually
+        # faster than two write calls (12 vs 13 seconds to output a gig of
+        # 1k records.) - results may differ on significantly larger records
+        # like .iso's but as they should be rare in any case and thus not
+        # likely to be the common case. The biggest issue is causing extreme
+        # memory pressure in that case. One possibly improvement here is to
+        # check the size of the content before deciding to join here vs call
+        # write twice.
+        self.write_func(''.join(byte_sections))
         self.records_written += 1
         # return a memo of where we wrote data to allow random access.
         return current_offset, self.current_offset - current_offset




More information about the bazaar-commits mailing list