[MERGE] Pyrex RIO implementation

Thu May 14 18:09:20 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jelmer Vernooij wrote:
> This patch adds a Pyrex implementation of RIO that is faster than the
> current implementation of RIO in "plain" Python. This is my first
> Pyrex extension for Bazaar so I hope I've updated all the right places
> :-)
> 
> The RIO tests pass with this patch, and with my RIOSerializer patch
> and this patch merged I now get faster deserialization using RIO than
> using XML:
> 
> <bzrlib.chk_serializer.CHKRIOSerializer object at 0x2bd8790>: 4.877576
> <bzrlib.xml8.Serializer_v8 object at 0x2bcc890>: 5.504792
> 
> generated using the following plugin:
> 
> #!/usr/bin/python
> 
> import time
> from bzrlib.xml8 import serializer_v8
> from bzrlib.chk_serializer import chk_rio_serializer
> from bzrlib.revision import Revision
> from bzrlib.commands import Command, register_command
> 
> rev = Revision('foo')
> rev.committer = "Joe Committer <joe at example.com>"
> rev.properties['author'] = "BAR"
> rev.message = "fdjklhfksdjh sdjkh dksjh fsdkh sdjkfhsdkjfhsdkfhjkh fsdjkh fksdjh fdksjh fsdjkh djkh fsdkj fhsdkjfh sdiufhieuh fiweuh fieuh fuih " * 20
> rev.timestamp = 4324234234
> rev.inventory_sha1 = "32432d323e32e23e32e"
> rev.timezone = 300
> 
> class cmd_bench_serializer(Command):
> 
>     def _bench(self, serializer, text, times):
>         t = time.time()
>         for i in xrange(times):
>             serializer.read_revision_from_string(text)
>         self.outf.write("%r: %f\n" % (serializer, time.time() - t))
> 
>     def run(self):
>         text = chk_rio_serializer.write_revision_to_string(rev)
>         self._bench(chk_rio_serializer, text, 100000)
> 
>         text = serializer_v8.write_revision_to_string(rev)
>         self._bench(serializer_v8, text, 100000)
> 
> register_command(cmd_bench_serializer)
> 
> I'm not sure how representative this is for "real" revisions, so it
> would be nice to try it in a real-world scenario again.

Well, when I tested dev7 before I did:

from bzrlib import branch
b = branch.Branch.open('target')
b.lock_read()
r = b.last_revision()
ancestry = b.repository.get_ancestry(r)
ancestry.pop() # First is always None, weird api

At this point, you can either do a 'get_revisions()' test, or if you
want to eliminate stuff like Index overhead you can do:

stream = r.revisions.get_record_stream([(a,) for a in ancestry],
				       'unordered', True)
texts = [record.get_bytes_as('fulltext') for record in stream]

At this point, you have all the texts around for processing. And you can do:

t = time.time()
for t in texts:
  serializer.read_revision_from_string(t)
d = time.time() - t

etc.

> 
> Cheers,
> 
> Jelmer
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkoMUEAACgkQJdeBCYSNAANOCgCfbmpZfXFN54KulqF5MKfyXnRI
MvEAn1Ezflf6NDTjtey0mKukzkHmf5Cc
=2ydJ
-----END PGP SIGNATURE-----