[MERGE] 22% Faster logs by optimizing get_texts

John Arbash Meinel john at arbash-meinel.com
Sat Jun 17 21:07:55 BST 2006

Hash: SHA1

Aaron Bentley wrote:
> Hi all,
> This patch continues my log performance work, by implementing get_texts
> so that it is a single readv operation.
> Note that it was also necessary to sort the records before sending them
> to readv-- should readv do its own sorting?
> Before the patch the test ran in 626 ms.  Now it runs in 510.
> Also, I've done a little clean-up work.
> Aaron


+    def _get_component_positions(self, version_id):
+        needed_versions, basis_versions = \
+            self._get_component_versions(version_id)
+        assert len(basis_versions) == 0
+        positions = []
+        for method, comp_id in needed_versions:
+            data_pos, data_size = self._index.get_position(comp_id)
+            positions.append((method, comp_id, data_pos, data_size))
+        return positions


+        needed_versions, basis_versions = \
+            self._get_component_versions(version_id)

         components = {}
         if basis_versions:
+            assert True, "I am broken"
+            basis = self.basis_knit

Shouldn't the above be 'assert False, "I am broken"

             records = []
             for comp_id in basis_versions:
                 data_pos, data_size =
@@ -603,7 +621,6 @@

         # digest here is the digest from the last applied component.
         if sha_strings(content.text()) != digest:
- -            import pdb;pdb.set_trace()
             raise KnitCorrupt(self.filename, 'sha-1 does not match %s'
% version_id)

         return content


I'm starting to wonder if we are hurting ourselves by working with
everything as lines rather than working on them as string blobs.

I know Weaves were very line based, but Knits don't have to be as line
based. I suppose difflib is also line based.
And while PatienceDiff is line based, that could be an implementation
detail, rather than being a public api.

+    def get_text(self, version_id):
+        """See VersionedFile.get_text"""
+        return self.get_texts([version_id])[0]
+    def get_texts(self, version_ids):
+        return [''.join(l) for l in self.get_line_list(version_ids)]


This is something I just read on some Python tutorial. There is a
compiled module 'operator'.
And you can do:
import operator
Since it is a compiled C function, it should be faster than a lambda.

         if len(needed_records):
+            needed_records.sort(key=lambda x:x[1])
             # We take it that the transport optimizes the fetching as

In general, I think it looks good. A relatively small change, just to
request groups instead of one at a time.

Version: GnuPG v1.4.0 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the bazaar mailing list