Encoding woes
John Arbash Meinel
john at arbash-meinel.com
Mon Jan 2 21:22:21 GMT 2006
Robert Collins wrote:
...
>>I think Robert was specifically against forbidding plain ascii strings
>>because it makes the library harder to use. And I agree with him on that
>>point. Which is where I'm saying that if we have an object which is a
>>plain string type, it should be either a text blob which we aren't
>>planning on interpreting, or it must be ascii only.
>>
>>I think we can do okay by just properly naming our variables and
>>parameters. If it ends in 'text' or 'lines', it is a text blob, in all
>>other cases (committer, message, revision_id, etc) it needs to be either
>>a valid ascii string, or unicode.
>
>
> Yay Hungarian. :)
>
> So I am thinking that something like the following happens in apis:
>
> def initialize(klass, path):
> """blah."""
> path = safe_unicode(path)
>
>
> def safe_unicode(a_string):
> """Coerce a_string into unicode.
>
> If a_string is already unicode, it is returned.
> If it is an ascii only string, it is decoded as if it were utf8.
> If the decoding fails, the exception is wrapped as a
> BzrBadParameter exception.
> """
>
>
> This will allow library users to use '.', u'.' and u'\ffff' and file
> system paths in unicode safely.
>
>
> Rob
>
Well your comment 'if it is an ascii only string, decode it as utf8' is
a little misleading, since ascii is not utf-8.
How about: 'if it is a plain string, it is decoded as if it were utf8'.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060102/7d5296a8/attachment.pgp
More information about the bazaar
mailing list