storage branch - remaining issues?
John A Meinel
john at arbash-meinel.com
Thu Jan 19 16:53:51 GMT 2006
Aaron Bentley wrote:
> John Arbash Meinel wrote:
> | Aaron Bentley wrote:
> |
> |>Robert Collins wrote:
> |>| We should remove the unicode cast minimally though.
> |>
> |>I think what we need is a UI function that interprets a path or URL and
> |>produces a URL.
> |
> | That is a possibility. But having LocalTransport translate, and none of
> | the other transports translate also has that effect.
>
> I would like a guarantee that the path separators are *either* '/'
> everywhere or os.sep everywhere. I would also like a guarantee that
> either the path elements are URL-encoded everywhere, or never
> URL-encoded. I see great value in a uniform interface, and no serious
> disadvantages.
>
Other than having the logic about how to decode a string (whether it is
a url, or a path) in another place, which then needs to be used
everywhere we might get a path.
For example, I don't think '.' is a valid url. Which means if we want
urls everywhere, then 'Branch.open_containing()' shouldn't be passed '.'.
I just see more centralization by putting it into the transport
constructors, rather than having to catch every path on its way in.
Another possibility would be to have 'takes_args' have classes, just
like 'takes_options' does. And then the arg-parser could convert paths
into their internal forms. (Or we could just assume that args are always
paths, because that is probably everything right now. Some options
aren't, but I think all args are.)
> |>Yes, it's messy that some files produce bytestrings and some produce
> |>unicodestrings, but I don't know where you get "should".
> |
> |
> | Without using a codec.get_reader() adapter, all files produce
> | bytestreams (because all files *are* bytestreams).
>
> Right, but with using a codec.get_reader adapter, they do. So clearly
> the Python library intends for some file-like objects to emit strings,
> and others to emit unicode. In fact, Branch.controlfile has been
> producing unicode-emitting files for many months.
You have a point.
>
> | The fact that
> | StringIO.StringIO() can be unicode inside is usually thought of as a
> | misfeature in our codebase.
>
> I'm not sure I'd agree with that. You'd have to point me at the
> specific cases.
>
> |>It does mean that in order to put unicode into a file, you have to have
> |>the whole thing in memory. I'd prefer accepting an iterable (which a
> |>string is), but I can live with this.
> |
> |
> | But where did you get unicode without decoding it from somewhere else?
>
> By processing data. For example, by iterating over
> Branch.revision_history()
>
> | And if you decoded it from somewhere else, why not skip the decode and
> | re-encode steps, and just pass the original file to put()?
>
> Because it didn't come from a file, and because I think it's dirty to
> write bytes to a file and then read unicode from it.
But at that point isn't it a (group of) strings in memory?
But I suppose you might want to:
branch2.put_utf8(branch1.get_utf8('foo'))
And get_utf8 would return a file which returns utf8, and branch2 would
put it.
(Though the above could be branch2.put(branch1.get('foo')))
I'm okay with iterators, I just don't like the idea of iterating over a
string.
>
> | put_utf8() is really only meant to be (I have some stuff in memory which
> | is already in unicode, and I want you to encode it, and put it into this
> | control file).
>
> I think put_utf8 serves just as well for "I want to replace the contents
> of this file with some text..."
>
> |>IterableFile isn't used elsewhere, so we can nuke it now, and revert it
> |>back when we need it.
>
> | I was going to use it in my changeset branch as well (if that ever gets
> | anywhere). So I don't mind if it stays around unused for a little while.
>
> Whatever you like. You already have a copy of it there, I believe.
>
> Aaron
I do. So we might actually have a file-id conflict. Though I think I got
the id from the changeset plugin. Hopefully Martin used the same id.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060119/6bfdbd5c/attachment.pgp
More information about the bazaar
mailing list