[MERGE] BEncode Revision Serializer

John Arbash Meinel john at arbash-meinel.com
Thu Jun 4 19:01:42 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:

> 
>> $ time PYTHONPATH=../bzr/work TIMEIT -s "from bzrlib import branch;
>> b = branch.Branch.open('bzr-dev7/bzr.dev');
>> b.lock_read();
>> keys = b.repository.revisions.keys();
>> stream = b.repository.revisions.get_record_stream(keys, 'unordered', True)
>> texts = [r.get_bytes_as('fulltext') for r in stream]
>> serializer = b.repository._serializer
>> read = serializer.read_revision_from_string
>> b.unlock()
>> " "revs = [read(t) for t in texts]"
> 
>> Basically, first extract all the raw texts, and then TIMEIT the actual
>> string => Revision time. This focuses on the serializer. Though honestly
>> the time to extract the texts is also somewhat important.

So, I spent some time tweaking the Pyrex code a bit. The main changes
are to use PyList_Append in _decode_list, change self.decode_object into
a cdef function, and use a macro for _update_tail. Though the last one
doesn't have a very large impact.

If you have a 'def foo()' function in pyrex, it has to go through the
python GetAttr code, because a child class could override it. With a
'cdef' function, only other Pyrex code can override it, so it can use a
struct pointer.

It also simplifies the Python validation of properties a bit. With this
change, I now have

dev6	  1.540 sec
dev7-rio  0.969 sec
dev7-benc 1.010 sec

However, if I remove the couple of type checking steps I get
dev7-benc 0.934 sec

So at least that brings bencode into 'as fast as rio when not doing type
checking'. I didn't really bother to optimize Encoder very much, mostly
because that is something we do 1 time every so often. While we call
Decoder for thousands of objects during every 'log'. At this point, only
about 300-400 ms is actually spent in bencode.bdecode(). So if we want
any faster, we just need to implement a *revision* decoder. I might play
with that to see where it goes.

And if you use "time bzr log -n0 --long" it actually ends up being
overall faster than Rio (tied with XML):

dev6	  7.379 sec
dev7-rio  7.690 sec
dev7-benc 7.379 sec

Anyway, since the BEncodeSerializer is now doing type checking of
attributes, and is pretty much 'as fast' as rio, it has my stamp of
approval as the preferred format for bzr-2.0.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkooDAYACgkQJdeBCYSNAAMlzwCdHQ7v9MRG7tddaPtc7Knw7c/7
7ikAn0MdBQE7c5vKC3C+12Eg2NpY2zsH
=Y/rb
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: jam_bencode_serializer.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090604/f55aba4d/attachment-0001.diff 


More information about the bazaar mailing list