[MERGE] BEncode Revision Serializer
John Arbash Meinel
john at arbash-meinel.com
Wed Jun 3 22:19:16 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
...
> $ time PYTHONPATH=../bzr/work TIMEIT -s "from bzrlib import branch;
> b = branch.Branch.open('bzr-dev7/bzr.dev');
> b.lock_read();
> keys = b.repository.revisions.keys();
> stream = b.repository.revisions.get_record_stream(keys, 'unordered', True)
> texts = [r.get_bytes_as('fulltext') for r in stream]
> serializer = b.repository._serializer
> read = serializer.read_revision_from_string
> b.unlock()
> " "revs = [read(t) for t in texts]"
>
> Basically, first extract all the raw texts, and then TIMEIT the actual
> string => Revision time. This focuses on the serializer. Though honestly
> the time to extract the texts is also somewhat important.
>
> With that in mind,
>
> dev6 1.51 sec per loop
> dev7 1.34 sec per loop
>
> I didn't test RIO for this, though I think I would like to. On my simple
> tests, it was actually considerably faster than XML or Bencode, which
> surprised me.
I did end up checking with the RIO code that Jelmer proposed earlier.
dev7-rio 0.93 sec per loop
So it is quite a bit faster than bencode and xml. Which surprises me a
bit, but perhaps it is all the strtol() overhead of bencode.
>
> To test the 'whole stack' I then did:
>
> time bzr log --no-aliases --long -n0 >/dev/null
> which gave (best of 3):
>
> dev6 7.285s
> dev7 7.660s
dev7-rio 7.644s
I should also comment on the various sizes of texts and compressed texts:
Raw % Compressed % Objects
dev6 13594 KiB 0% 2592 KiB 10% 24941
dev7-benc 11558 KiB 0% 2616 KiB 11% 24941
dev7-rio 10170 KiB 0% 2599 KiB 10% 24941
Interesting to see that both rio and bencode are smaller in terms of
'raw' text, but their compressed form is larger (though rio is *really*
close). Also interesting that 'time bzr log' does not seem to match the
times for pure decoding of the revision texts.
Anyway, these numbers seem to still indicate that RIO is slightly faster
than bencode, but not sufficiently so that it makes a difference in any
real-world operation.
I'll also note that both rio and bencode would probably benefit the most
from a directly tuned pyrexRevision deserializer, rather than going
through the intermediate representation. (Consider that at a minimum, we
can avoid a malloc per attribute, because we don't have to create a
PyString for the 'key'.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkom6NQACgkQJdeBCYSNAAPHdgCgnlFXifmT4F/xEn3FAXfI1kEh
YZ0AoK1uWNYh0D2lK+wF4iZnuLearW9m
=62FJ
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list