[MERGE] BEncode Revision Serializer

Wed Jun 3 22:19:16 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> $ time PYTHONPATH=../bzr/work TIMEIT -s "from bzrlib import branch;
> b = branch.Branch.open('bzr-dev7/bzr.dev');
> b.lock_read();
> keys = b.repository.revisions.keys();
> stream = b.repository.revisions.get_record_stream(keys, 'unordered', True)
> texts = [r.get_bytes_as('fulltext') for r in stream]
> serializer = b.repository._serializer
> read = serializer.read_revision_from_string
> b.unlock()
> " "revs = [read(t) for t in texts]"
> 
> Basically, first extract all the raw texts, and then TIMEIT the actual
> string => Revision time. This focuses on the serializer. Though honestly
> the time to extract the texts is also somewhat important.
> 
> With that in mind,
> 
> dev6	1.51 sec per loop
> dev7	1.34 sec per loop
> 
> I didn't test RIO for this, though I think I would like to. On my simple
> tests, it was actually considerably faster than XML or Bencode, which
> surprised me.

I did end up checking with the RIO code that Jelmer proposed earlier.

dev7-rio 0.93 sec per loop

So it is quite a bit faster than bencode and xml. Which surprises me a
bit, but perhaps it is all the strtol() overhead of bencode.

> 
> To test the 'whole stack' I then did:
> 
> time bzr log --no-aliases --long -n0 >/dev/null
> which gave (best of 3):
> 
> dev6	7.285s
> dev7	7.660s

dev7-rio 7.644s

I should also comment on the various sizes of texts and compressed texts:

                      Raw    %    Compressed    %  Objects
dev6		13594 KiB   0%      2592 KiB  10%    24941
dev7-benc	11558 KiB   0%      2616 KiB  11%    24941
dev7-rio	10170 KiB   0%      2599 KiB  10%    24941

Interesting to see that both rio and bencode are smaller in terms of
'raw' text, but their compressed form is larger (though rio is *really*
close). Also interesting that 'time bzr log' does not seem to match the
times for pure decoding of the revision texts.

Anyway, these numbers seem to still indicate that RIO is slightly faster
than bencode, but not sufficiently so that it makes a difference in any
real-world operation.

I'll also note that both rio and bencode would probably benefit the most
from a directly tuned pyrexRevision deserializer, rather than going
through the intermediate representation. (Consider that at a minimum, we
can avoid a malloc per attribute, because we don't have to create a
PyString for the 'key'.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkom6NQACgkQJdeBCYSNAAPHdgCgnlFXifmT4F/xEn3FAXfI1kEh
YZ0AoK1uWNYh0D2lK+wF4iZnuLearW9m
=62FJ
-----END PGP SIGNATURE-----