[MERGE] pyrex bencode implementation

Alexander Belchenko bialix at ukr.net
Wed Aug 15 20:01:18 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley пишет:
> Alexander Belchenko wrote:
>> Here is the patch for pyrex bencode version.
> 
>> I create simple benchmark for tags serialization/deserialization
>> (i.e. indirect benchmark for bencode). This benchmark use
>> tags dictionary with 100 items.
> 
>> Here results on my machine (CeleronM 1.7GHz Windows XP)
> 
>> Pure python bencode:
> 
>>      906ms bzrlib.benchmarks.bench_tags.TagsBencodeBenchmark.test_deserialize_tags
>>      656ms bzrlib.benchmarks.bench_tags.TagsBencodeBenchmark.test_serialize_tags
> 
>> Pyrex version:
> 
>>      375ms bzrlib.benchmarks.bench_tags.TagsBencodeBenchmark.test_deserialize_tags
>>      453ms bzrlib.benchmarks.bench_tags.TagsBencodeBenchmark.test_serialize_tags
> 
>> These numbers are for 1000-iteration loop, so you need to divide time by 1000,
>> i.e. it's actually us not ms for 1 iteration.
> 
> BB:abstain
> 
> So that makes it .906ms for the python implementation and .375ms for the
> pyrex implementation?  That doesn't sound worth it to me.
> 
> Remember, increasing performance isn't about just optimizing anything we
> can.  Optimization always has a cost, usually in code clarity and
> increased maintenance.
> 
> So optimization should start with profiling the code, seeing what parts
> of what operation are slow, and then deciding the correct way to improve
> performance.
> 
> For all I know, your performance win comes simply from avoiding function
> call overhead, and that could be fixed without Pyrex.

I think it's not quite true. In decode I also avoid numerous memory
allocations for each intermediate value.

> 
> HACKING says a patch should "Improves bugs, features, speed, or code
> simplicity"

(Or scratch the itch of patch's author?)

> This patch reduces code simplicity, and the speed increases don't seem
> to be substantial.

I think you're right. But in this case your vote is incorrect.
I expecting bb:reject.


May I cite my previous posts about this implementation (5 and 7 August)?

"I started to dive into Pyrex, and decided to write something simple and useful
for Bazaar project. I chose bencode because it's simple algorithm,
and plus in the past Martin said that we could use bencode for VersionedProperties.

I use benchmarks from BitTorrent-bencode-5.0.8 package. With generic benchmark data
I have about 5x-6x faster decode and more than 2.5x faster encode:

version     decode     encode
python     30.78ms    18.75ms
pyrex       5.31ms     7.03ms

For VersionedProperties I think I need rework benchmark to use dicts with
several (2-3) keys. Probably for shorter bencode strings decode in python will go
fast enough, so speed difference will be smaller.
But I think even 2-3x is better than nothing.
Of course for tags this difference will be too small, but for 10K-50K files
with VersionedProperties attached to entries in inventory we should have big win."


I have nothing to add here, except maybe my benchmark is awfully bad.
I'm not planning to speed up tags, they already fast enough.
I also asked about kind of data that serialized with bencode by another
part of bzrlib but nobody answer me:

"As I can see from grep output bencode currently used
in tag.py, multiparent.py and bundle/serializer/v4.py.
I know how it used in tag.py.

Can someone give me quick shot what kind of data bencoded in
multiparent and v4-serializer? What typical amount of data
and which type they are?

I want to write benchmark reflecting current usage of bencode
in bzrlib."

/me on half way to vacation.

- --
[µ]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGw01+zYr338mxwCURAhRyAJ0TndXUmZkonZCTLcvmm5F8PLCBQACdF66N
kZi6GB+WQOHWK1KmQ0wRn/8=
=AwQb
-----END PGP SIGNATURE-----



More information about the bazaar mailing list