storage branch - remaining issues?
Aaron Bentley
aaron.bentley at utoronto.ca
Thu Jan 19 02:08:45 GMT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
| Aaron Bentley wrote:
|
|>Robert Collins wrote:
|>| We should remove the unicode cast minimally though.
|>
|>I think what we need is a UI function that interprets a path or URL and
|>produces a URL.
|
| That is a possibility. But having LocalTransport translate, and none of
| the other transports translate also has that effect.
I would like a guarantee that the path separators are *either* '/'
everywhere or os.sep everywhere. I would also like a guarantee that
either the path elements are URL-encoded everywhere, or never
URL-encoded. I see great value in a uniform interface, and no serious
disadvantages.
|>Yes, it's messy that some files produce bytestrings and some produce
|>unicodestrings, but I don't know where you get "should".
|
|
| Without using a codec.get_reader() adapter, all files produce
| bytestreams (because all files *are* bytestreams).
Right, but with using a codec.get_reader adapter, they do. So clearly
the Python library intends for some file-like objects to emit strings,
and others to emit unicode. In fact, Branch.controlfile has been
producing unicode-emitting files for many months.
| The fact that
| StringIO.StringIO() can be unicode inside is usually thought of as a
| misfeature in our codebase.
I'm not sure I'd agree with that. You'd have to point me at the
specific cases.
|>It does mean that in order to put unicode into a file, you have to have
|>the whole thing in memory. I'd prefer accepting an iterable (which a
|>string is), but I can live with this.
|
|
| But where did you get unicode without decoding it from somewhere else?
By processing data. For example, by iterating over
Branch.revision_history()
| And if you decoded it from somewhere else, why not skip the decode and
| re-encode steps, and just pass the original file to put()?
Because it didn't come from a file, and because I think it's dirty to
write bytes to a file and then read unicode from it.
| put_utf8() is really only meant to be (I have some stuff in memory which
| is already in unicode, and I want you to encode it, and put it into this
| control file).
I think put_utf8 serves just as well for "I want to replace the contents
of this file with some text..."
|>IterableFile isn't used elsewhere, so we can nuke it now, and revert it
|>back when we need it.
| I was going to use it in my changeset branch as well (if that ever gets
| anywhere). So I don't mind if it stays around unused for a little while.
Whatever you like. You already have a copy of it there, I believe.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDzvSt0F+nu1YWqI0RAh3AAJwL/J1iMKlqzGXT8mDkSgwIvii8jQCfYSq+
dDgWNbYje3pEuP5lvzO9SoE=
=f1bT
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list