[MERGE] Pyrex RIO implementation
John Arbash Meinel
john at arbash-meinel.com
Thu May 14 18:09:20 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jelmer Vernooij wrote:
> This patch adds a Pyrex implementation of RIO that is faster than the
> current implementation of RIO in "plain" Python. This is my first
> Pyrex extension for Bazaar so I hope I've updated all the right places
> :-)
>
> The RIO tests pass with this patch, and with my RIOSerializer patch
> and this patch merged I now get faster deserialization using RIO than
> using XML:
>
> <bzrlib.chk_serializer.CHKRIOSerializer object at 0x2bd8790>: 4.877576
> <bzrlib.xml8.Serializer_v8 object at 0x2bcc890>: 5.504792
>
> generated using the following plugin:
>
> #!/usr/bin/python
>
> import time
> from bzrlib.xml8 import serializer_v8
> from bzrlib.chk_serializer import chk_rio_serializer
> from bzrlib.revision import Revision
> from bzrlib.commands import Command, register_command
>
> rev = Revision('foo')
> rev.committer = "Joe Committer <joe at example.com>"
> rev.properties['author'] = "BAR"
> rev.message = "fdjklhfksdjh sdjkh dksjh fsdkh sdjkfhsdkjfhsdkfhjkh fsdjkh fksdjh fdksjh fsdjkh djkh fsdkj fhsdkjfh sdiufhieuh fiweuh fieuh fuih " * 20
> rev.timestamp = 4324234234
> rev.inventory_sha1 = "32432d323e32e23e32e"
> rev.timezone = 300
>
> class cmd_bench_serializer(Command):
>
> def _bench(self, serializer, text, times):
> t = time.time()
> for i in xrange(times):
> serializer.read_revision_from_string(text)
> self.outf.write("%r: %f\n" % (serializer, time.time() - t))
>
> def run(self):
> text = chk_rio_serializer.write_revision_to_string(rev)
> self._bench(chk_rio_serializer, text, 100000)
>
> text = serializer_v8.write_revision_to_string(rev)
> self._bench(serializer_v8, text, 100000)
>
> register_command(cmd_bench_serializer)
>
> I'm not sure how representative this is for "real" revisions, so it
> would be nice to try it in a real-world scenario again.
Well, when I tested dev7 before I did:
from bzrlib import branch
b = branch.Branch.open('target')
b.lock_read()
r = b.last_revision()
ancestry = b.repository.get_ancestry(r)
ancestry.pop() # First is always None, weird api
At this point, you can either do a 'get_revisions()' test, or if you
want to eliminate stuff like Index overhead you can do:
stream = r.revisions.get_record_stream([(a,) for a in ancestry],
'unordered', True)
texts = [record.get_bytes_as('fulltext') for record in stream]
At this point, you have all the texts around for processing. And you can do:
t = time.time()
for t in texts:
serializer.read_revision_from_string(t)
d = time.time() - t
etc.
>
> Cheers,
>
> Jelmer
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkoMUEAACgkQJdeBCYSNAANOCgCfbmpZfXFN54KulqF5MKfyXnRI
MvEAn1Ezflf6NDTjtey0mKukzkHmf5Cc
=2ydJ
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list