storage branch - remaining issues?

Aaron Bentley aaron.bentley at utoronto.ca
Thu Jan 19 02:08:45 GMT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
| Aaron Bentley wrote:
|
|>Robert Collins wrote:
|>| We should remove the unicode cast minimally though.
|>
|>I think what we need is a UI function that interprets a path or URL and
|>produces a URL.
|
| That is a possibility. But having LocalTransport translate, and none of
| the other transports translate also has that effect.

I would like a guarantee that the path separators are *either* '/'
everywhere or os.sep everywhere.  I would also like a guarantee that
either the path elements are URL-encoded everywhere, or never
URL-encoded.  I see great value in a uniform interface, and no serious
disadvantages.

|>Yes, it's messy that some files produce bytestrings and some produce
|>unicodestrings, but I don't know where you get "should".
|
|
| Without using a codec.get_reader() adapter, all files produce
| bytestreams (because all files *are* bytestreams).

Right, but with using a codec.get_reader adapter, they do.  So clearly
the Python library intends for some file-like objects to emit strings,
and others to emit unicode.  In fact, Branch.controlfile has been
producing unicode-emitting files for many months.

| The fact that
| StringIO.StringIO() can be unicode inside is usually thought of as a
| misfeature in our codebase.

I'm not sure I'd agree with that.  You'd have to point me at the
specific cases.

|>It does mean that in order to put unicode into a file, you have to have
|>the whole thing in memory.  I'd prefer accepting an iterable (which a
|>string is), but I can live with this.
|
|
| But where did you get unicode without decoding it from somewhere else?

By processing data.  For example, by iterating over
Branch.revision_history()

| And if you decoded it from somewhere else, why not skip the decode and
| re-encode steps, and just pass the original file to put()?

Because it didn't come from a file, and because I think it's dirty to
write bytes to a file and then read unicode from it.

| put_utf8() is really only meant to be (I have some stuff in memory which
| is already in unicode, and I want you to encode it, and put it into this
| control file).

I think put_utf8 serves just as well for "I want to replace the contents
of this file with some text..."

|>IterableFile isn't used elsewhere, so we can nuke it now, and revert it
|>back when we need it.

| I was going to use it in my changeset branch as well (if that ever gets
| anywhere). So I don't mind if it stays around unused for a little while.

Whatever you like.  You already have a copy of it there, I believe.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDzvSt0F+nu1YWqI0RAh3AAJwL/J1iMKlqzGXT8mDkSgwIvii8jQCfYSq+
dDgWNbYje3pEuP5lvzO9SoE=
=f1bT
-----END PGP SIGNATURE-----




More information about the bazaar mailing list