Rev 2898: (robertc) Save time when recording full text plain-knit records by using the already joined textual data in _record_to_data. (Robert Collins) in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue Oct 9 05:44:49 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2898
revision-id: pqm at pqm.ubuntu.com-20071009044446-uliu5z9a52bzmps8
parent: pqm at pqm.ubuntu.com-20071009024619-6l2c5sd2ghsyw2hx
parent: robertc at robertcollins.net-20071009035742-sfpuvvyxewtmajk3
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2007-10-09 05:44:46 +0100
message:
  (robertc) Save time when recording full text plain-knit records by using the already joined textual data in _record_to_data. (Robert Collins)
modified:
  bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
    ------------------------------------------------------------
    revno: 2888.1.3
    merged: robertc at robertcollins.net-20071009035742-sfpuvvyxewtmajk3
    parent: robertc at robertcollins.net-20071005030427-835quoopdu6adviv
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: knits
    timestamp: Tue 2007-10-09 13:57:42 +1000
    message:
      Review feedback.
    ------------------------------------------------------------
    revno: 2888.1.2
    merged: robertc at robertcollins.net-20071005030427-835quoopdu6adviv
    parent: robertc at robertcollins.net-20071004223027-e92ucdpgeh0s67g4
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: knits
    timestamp: Fri 2007-10-05 13:04:27 +1000
    message:
      Cleanup the dense_lines parameter docstring to be more useful.
    ------------------------------------------------------------
    revno: 2888.1.1
    merged: robertc at robertcollins.net-20071004223027-e92ucdpgeh0s67g4
    parent: pqm at pqm.ubuntu.com-20071004215001-549ul8av89cwpnjp
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: knits
    timestamp: Fri 2007-10-05 08:30:27 +1000
    message:
      (robertc) Use prejoined content for knit storage when performing a full-text store of unannotated content. (Robert Collins)
=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py	2007-10-08 07:29:57 +0000
+++ b/bzrlib/knit.py	2007-10-09 04:44:46 +0000
@@ -886,6 +886,7 @@
                 lines = lines[:]
                 options.append('no-eol')
                 lines[-1] = lines[-1] + '\n'
+                line_bytes += '\n'
 
         if delta:
             # To speed the extract of texts the delta chain is limited
@@ -908,11 +909,18 @@
                 store_lines)
         else:
             options.append('fulltext')
-            # get mixed annotation + content and feed it into the
-            # serialiser.
-            store_lines = self.factory.lower_fulltext(content)
-            size, bytes = self._data._record_to_data(version_id, digest,
-                store_lines)
+            # isinstance is slower and we have no hierarchy.
+            if self.factory.__class__ == KnitPlainFactory:
+                # Use the already joined bytes saving iteration time in
+                # _record_to_data.
+                size, bytes = self._data._record_to_data(version_id, digest,
+                    lines, [line_bytes])
+            else:
+                # get mixed annotation + content and feed it into the
+                # serialiser.
+                store_lines = self.factory.lower_fulltext(content)
+                size, bytes = self._data._record_to_data(version_id, digest,
+                    store_lines)
 
         access_memo = self._data.add_raw_records([size], bytes)[0]
         self._index.add_versions(
@@ -1972,17 +1980,26 @@
     def _open_file(self):
         return self._access.open_file()
 
-    def _record_to_data(self, version_id, digest, lines):
+    def _record_to_data(self, version_id, digest, lines, dense_lines=None):
         """Convert version_id, digest, lines into a raw data block.
         
+        :param dense_lines: The bytes of lines but in a denser form. For
+            instance, if lines is a list of 1000 bytestrings each ending in \n,
+            dense_lines may be a list with one line in it, containing all the
+            1000's lines and their \n's. Using dense_lines if it is already
+            known is a win because the string join to create bytes in this
+            function spends less time resizing the final string.
         :return: (len, a StringIO instance with the raw data ready to read.)
         """
-        bytes = (''.join(chain(
+        # Note: using a string copy here increases memory pressure with e.g.
+        # ISO's, but it is about 3 seconds faster on a 1.2Ghz intel machine
+        # when doing the initial commit of a mozilla tree. RBC 20070921
+        bytes = ''.join(chain(
             ["version %s %d %s\n" % (version_id,
                                      len(lines),
                                      digest)],
-            lines,
-            ["end %s\n" % version_id])))
+            dense_lines or lines,
+            ["end %s\n" % version_id]))
         assert bytes.__class__ == str
         compressed_bytes = bytes_to_gzip(bytes)
         return len(compressed_bytes), compressed_bytes




More information about the bazaar-commits mailing list