[MERGE] Before actually using encoding need to check that Python has corresponding codec (v.3)

Tue Jan 2 18:46:03 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belchenko wrote:
> John Arbash Meinel пишет:
>> Alexander Belchenko wrote:
>>> It seems that my code and tests is incomplete because PQM fails when
>>> running tests before merging. I have problem with one test so PQM give
>>> me failure on merge request. It's Linux-specific, this test skipped
>>> on win32.
>>
>  ...
>> +
>> +
>> +class FakeCodec(object):
>> +    """Special singleton class that helps testing
>> +    over several non-existed encodings.
>> +
>> +    Clients could add new encoding names, but cannot remove it.
>> +    """
>> +    _registered = False
>> +    _enabled_encodings = set()
>> +
>> +    def add(self, encoding_name):
>> +        if not self._registered:
>> +            codecs.register(self)
>> +            self._registered = True
>> +        self._enabled_encodings.add(encoding_name)
>> +
>> +    def __call__(self, encoding_name):
>> +        """Called indirectly by codecs module during lookup"""
>> +        if encoding_name in self._enabled_encodings:
>> +            return codecs.lookup('latin-1')
>> +
>> +
>> +fake_codec = FakeCodec()
>>
>>
>> ^- I think it is important that clients can remove the registered
>> encoding, to leave things in a pristine state. It is fine to leave
>> 'FakeCodec' registered, because it is just a wrapper. But you should
>> remove entries from FakeCode._enabled_encodings(). That way we don't
>> have transference of state between tests.
> 
> Unfortunately I cannot unregister encoding name from codecs registry.
> Standard codecs module use internal caching of looked-up encodings
> (probably to speed-up operations) and I don't find a way to reset their
> cache.
> 
> So, simply remove encoding name from self._enabled_encoding is not
> enough. Because __call__ method for each non-standard encoding invoked
> only 1 time.
> 
> To check this fact run attached file.
> On my win32 output is:
> 
> C:\work\Bazaar\__tests>python fake_codec.py
> cp0 is unknown encoding
> 1
> __call__(cp0)
> 2
> 
> C:\work\Bazaar\__tests>
> 
> 
>> Also, we avoid using mutable class-level variables if we access them
>> over self. because unless you understand python well, it can be
>> confusing what is happening. So it is better to use:
>>
>> def __init__(self):
>>   self._registered = False
>>   self._enabled_encodings = set()
> 
> In this case fake_codec no more singleton, IIUC.
> 

Well, it is a singleton in that there is only one instance "fake_codec".
If you really wanted a singleton the common way is to do:

class _FakeCodec(object):
 ...

_fake_codec = None
def FakeCodec(...):
  global _fake_codec
  if _fake_codec is None:
    _fake_codec = _FakeCodec(...)
  return _fake_codec

Or even

_fake_codec = _FakeCodec() # Singleton

def FakeCodec():
  return _fake_codec

However, if you just want to use class variables regardless of whether
someone instantiates FakeCodec(), then you can use:

def __init__(self):
  if not FakeCodec._registered:
    ...
    FakeCodec._registered = True

And then it is clear that you are using the class level variables.

The real problem is when you do something like:

class Foo():
  _x = set()

 def add(self, x):
    self._x.add(x)

 def reset(self):
    self._x = set()

^- This looks perfectly reasonable, but it turns out that 'add()' will
reuse the class-level variable until you call reset(), and then it will
start using a object-specific variable, probably causing unexpected results:

f1 = Foo()
f2 = Foo()

f1.add(1) # both f1 and f2 have 1
f1.reset() # f1 now has a different set, f2 still has 1
f1.add(2) # f1 has 2, and f2 has 1

f3 = Foo() # f3 is like f2, and has 1, but not 2

...

>> And then in the test cleanup, add fake_codec.reset()
> 
> As I explain above it's not working.
> 
> -- 
> Alexander

I see your problem... And since lookup() is a C function, it probably
isn't possible to get easy access to whatever cache it is using.

So we can go with how it is, and just make it clear what is going on.

You could change this line:
   Clients could add new encoding names, but cannot remove it.

to be
   Clients can add new encoding names, but because of how codecs is
   implemented they cannot be removed. Be careful with naming to avoid
   collisions between tests, and with real encodings.

We might also want to have a check if we are adding one that we already
added, we should complain, so that people know to avoid collisions
between test cases. But I'm not stuck on that if it is useful to re-use
a fake encoding.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFmqhrJdeBCYSNAAMRAtA1AJ9fbAniU+TO+DurH+2d3K4ACjfJXQCfQ9oJ
BczuoWuC7MaUqgcKAlhTSAY=
=wTYv
-----END PGP SIGNATURE-----