[MERGE] 22% Faster logs by optimizing get_texts
John Arbash Meinel
john at arbash-meinel.com
Sat Jun 17 21:07:55 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> Hi all,
>
> This patch continues my log performance work, by implementing get_texts
> so that it is a single readv operation.
>
> Note that it was also necessary to sort the records before sending them
> to readv-- should readv do its own sorting?
>
> Before the patch the test ran in 626 ms. Now it runs in 510.
>
> Also, I've done a little clean-up work.
>
> Aaron
...
+ def _get_component_positions(self, version_id):
+ needed_versions, basis_versions = \
+ self._get_component_versions(version_id)
+ assert len(basis_versions) == 0
+ positions = []
+ for method, comp_id in needed_versions:
+ data_pos, data_size = self._index.get_position(comp_id)
+ positions.append((method, comp_id, data_pos, data_size))
+ return positions
+
...
+ needed_versions, basis_versions = \
+ self._get_component_versions(version_id)
components = {}
if basis_versions:
+ assert True, "I am broken"
+ basis = self.basis_knit
Shouldn't the above be 'assert False, "I am broken"
records = []
for comp_id in basis_versions:
data_pos, data_size =
basis._index.get_data_position(comp_id)
@@ -603,7 +621,6 @@
# digest here is the digest from the last applied component.
if sha_strings(content.text()) != digest:
- - import pdb;pdb.set_trace()
raise KnitCorrupt(self.filename, 'sha-1 does not match %s'
% version_id)
return content
...
I'm starting to wonder if we are hurting ourselves by working with
everything as lines rather than working on them as string blobs.
I know Weaves were very line based, but Knits don't have to be as line
based. I suppose difflib is also line based.
And while PatienceDiff is line based, that could be an implementation
detail, rather than being a public api.
+
+ def get_text(self, version_id):
+ """See VersionedFile.get_text"""
+ return self.get_texts([version_id])[0]
+
+ def get_texts(self, version_ids):
+ return [''.join(l) for l in self.get_line_list(version_ids)]
+
...
This is something I just read on some Python tutorial. There is a
compiled module 'operator'.
And you can do:
import operator
needed_records.sort(key=operator.item_getter(1))
Since it is a compiled C function, it should be faster than a lambda.
if len(needed_records):
+ needed_records.sort(key=lambda x:x[1])
# We take it that the transport optimizes the fetching as
In general, I think it looks good. A relatively small change, just to
request groups instead of one at a time.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFElGEbJdeBCYSNAAMRAjYlAKCTxM6htiBmjirXtxZ8dMDPLe+jZwCeKbnf
f5CaL73C3FaFAt6rnuYXKIw=
=SBNu
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list