Encoding woes

John Arbash Meinel john at arbash-meinel.com
Mon Jan 2 21:22:21 GMT 2006


Robert Collins wrote:

...

>>I think Robert was specifically against forbidding plain ascii strings
>>because it makes the library harder to use. And I agree with him on that
>>point. Which is where I'm saying that if we have an object which is a
>>plain string type, it should be either a text blob which we aren't
>>planning on interpreting, or it must be ascii only.
>>
>>I think we can do okay by just properly naming our variables and
>>parameters. If it ends in 'text' or 'lines', it is a text blob, in all
>>other cases (committer, message, revision_id, etc) it needs to be either
>>a valid ascii string, or unicode.
> 
> 
> Yay Hungarian. :)
> 
> So I am thinking that something like the following happens in apis:
> 
> def initialize(klass, path):
>     """blah."""
>     path = safe_unicode(path)
> 
> 
> def safe_unicode(a_string):
>     """Coerce a_string into unicode.
> 
>     If a_string is already unicode, it is returned.
>     If it is an ascii only string, it is decoded as if it were utf8.
>     If the decoding fails, the exception is wrapped as a 
>     BzrBadParameter exception.
>     """
> 
> 
> This will allow library users to use '.', u'.' and u'\ffff' and file
> system paths in unicode safely.
> 
> 
> Rob
> 

Well your comment 'if it is an ascii only string, decode it as utf8' is
a little misleading, since ascii is not utf-8.
How about: 'if it is a plain string, it is decoded as if it were utf8'.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060102/7d5296a8/attachment.pgp 


More information about the bazaar mailing list