Encoding woes

Wed Dec 28 10:42:49 GMT 2005

On Mon, Dec 26, 2005 at 16:04:44 -0600, John A Meinel wrote:
> I think there should be 3 types of strings inside bzrlib:
> 
> 1) Plain ascii strings, these are isinstance(x, string), these should
> not have characters outside the ascii set. (so x.decode() should always
> work)
> 2) Unicode strings, for anything outside of ascii, it should be a
> unicode string.

Why do these need to be two types of strings? Ascii is a subset of unicode.

> 3) Text blobs. These are just arrays of bytes. Stuff that we would never
> try to encode/decode. This is stuff like file contents, etc. The only
> thing we might do with these strings is split them on newlines.

Hm, I believe there should be a special class made for them. So they could
always be told from case 1. Also if all ascii strings are made unicode (which
I think they can), then the plain string type can be outlawed except in the
external interface (only the part for front-ends) so forgetting to classify
the input would be immediately obvious.

> Stuff that is read from stdin, or read from the argument list needs to
> be converted into one of those 3 strings.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051228/93948fd4/attachment.pgp