[MERGE] BEncode Revision Serializer
Jelmer Vernooij
jelmer at samba.org
Thu Jun 4 22:04:32 BST 2009
Hi John,
John Arbash Meinel wrote:
> John Arbash Meinel wrote:
>
>>> $ time PYTHONPATH=../bzr/work TIMEIT -s "from bzrlib import branch;
>>> b = branch.Branch.open('bzr-dev7/bzr.dev');
>>> b.lock_read();
>>> keys = b.repository.revisions.keys();
>>> stream = b.repository.revisions.get_record_stream(keys, 'unordered', True)
>>> texts = [r.get_bytes_as('fulltext') for r in stream]
>>> serializer = b.repository._serializer
>>> read = serializer.read_revision_from_string
>>> b.unlock()
>>> " "revs = [read(t) for t in texts]"
>>> Basically, first extract all the raw texts, and then TIMEIT the actual
>>> string => Revision time. This focuses on the serializer. Though honestly
>>> the time to extract the texts is also somewhat important.
>
> So, I spent some time tweaking the Pyrex code a bit. The main changes
> are to use PyList_Append in _decode_list, change self.decode_object into
> a cdef function, and use a macro for _update_tail. Though the last one
> doesn't have a very large impact.
>
> If you have a 'def foo()' function in pyrex, it has to go through the
> python GetAttr code, because a child class could override it. With a
> 'cdef' function, only other Pyrex code can override it, so it can use a
> struct pointer.
>
> It also simplifies the Python validation of properties a bit. With this
> change, I now have
>
> dev6 1.540 sec
> dev7-rio 0.969 sec
> dev7-benc 1.010 sec
>
> However, if I remove the couple of type checking steps I get
> dev7-benc 0.934 sec
>
> So at least that brings bencode into 'as fast as rio when not doing type
> checking'. I didn't really bother to optimize Encoder very much, mostly
> because that is something we do 1 time every so often. While we call
> Decoder for thousands of objects during every 'log'. At this point, only
> about 300-400 ms is actually spent in bencode.bdecode(). So if we want
> any faster, we just need to implement a *revision* decoder. I might play
> with that to see where it goes.
>
> And if you use "time bzr log -n0 --long" it actually ends up being
> overall faster than Rio (tied with XML):
>
> dev6 7.379 sec
> dev7-rio 7.690 sec
> dev7-benc 7.379 sec
>
> Anyway, since the BEncodeSerializer is now doing type checking of
> attributes, and is pretty much 'as fast' as rio, it has my stamp of
> approval as the preferred format for bzr-2.0.
Nice!
It would indeed be nice to have a custom serializer at some point to
save a couple more CPU cycles at some point. There's probably better
things to spend time on before 2.0 though, I think.
I think this deserves a NEWS entry (and it's probably my fault it's not
there in the first place). Other than that, looking really forward to
seeing this land!
bb:tweak
Cheers,
Jelmer
More information about the bazaar
mailing list