[MERGE] import both c-extension and python module with the same name for testing

Tue Aug 7 02:08:03 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins пишет:
> On Mon, 2007-08-06 at 23:33 +0300, Alexander Belchenko wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Today I had conversation on IRC with spiv and Peng.
>> We discussed with spiv my pyrex bencode and then started to discuss
>> new strategy of bzrlib for pyrex/c extensions modules.
>>
>> Because we want to tests all versions (python and pyrex) separately,
>> John for his _knit_load_data pyrex extension did such things:
>>
>> 1) python version of module named _load_data_py.py
>> 2) pyrex version of module named _load_data_c.pyx
>> 3) Then importing one of them (in knit.py):
>>
>> try:
>>     from bzrlib._knit_load_data_c import _load_data_c as _load_data
>> except ImportError:
>>     from bzrlib._knit_load_data_py import _load_data_py as _load_data
>>
>> If we need this zoo only for testing, then we could provide more complex
>> importing scheme for test suite, but simplify core code.
> 
> Well, theres no need for other modules to import the fastpath code. In
> actual fact I had this basic approach in my very early pyrex
> experiments, but we moved away from it.

I don't say that particularly _load_data code is slow.
But for some modules keep 2 versions with different names might be overhead.

Let's imagine next situation:

wrapper foo.py that try to import either _foo_c.pyx or _foo_py.py.
All another modules in bzrlib imports only foo.

Because of caching imported modules in Python this code will be executed only once:

try:
	from _foo_c import *
except ImportError:
	from _foo_py import *

So, time for `import foo` will be:
  time to import foo.py
+ time to try to import _foo_c
+ (probably) penalty for exception
+ (probably) time to import _foo_py

It seems to be very small time. But according to
http://www.jacobian.org/writing/2007/mar/04/hate-python/ (section 2):

"...the way Python’s import mechanism works; importing a package makes around ten different open
syscalls for each entry on sys.path; that is, import foo looks for:

        * foo.so
        * foomodule.so
        * foo.py
        * foo.pyc
        * foo.pyo
        * foo/__init__.so
        * foo/__init__module.so
        * foo/__init__.py
        * foo/__init__.pyc
        * foo/__init__.pyo
"

So in the worst case there will be 16 syscalls for 3 different files and penalty for except.
I don't know how to measure such small values, especially on win32. May be it will cost
about a couple of hundred microseconds? I don't know.

>> Instead of keeping py and c versions with separate module names, we could
>> create them with the same module name (_load_data.py and _load_data.pyx for example
>> above). At runtime Python interpreter will import c-extension first if any
>> presents. And for testing we need to use special approach to import
>> python version. This patch provide mechanism to achieve this goal.
>>
>> The main reason for this patch is faster import of modules: we let to Python
>> interpreter choose appropriate version of module and get rid of
>> try/except ImportError construct.
> 
> I don't believe there will be any difference in performance when the C
> version is present. I'd like to see test results on the performance
> difference when the C version is not present.

I don't know how to measure such small values.
Real gain that I see in using proposed approach is to simplify code
and reduce amount of unnecessary wrappers.

I feel that my arguments are weak.

- --
[µ]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGt8XzzYr338mxwCURAsWyAJ9QUlpcYw2X0Izcea4e+39KQEo63ACglLO2
LVYQKQk0VJnxRLxkuXD7bcE=
=CtCy
-----END PGP SIGNATURE-----