storage branch - remaining issues?

John A Meinel john at arbash-meinel.com
Thu Jan 19 16:53:51 GMT 2006


Aaron Bentley wrote:
> John Arbash Meinel wrote:
> | Aaron Bentley wrote:
> |
> |>Robert Collins wrote:
> |>| We should remove the unicode cast minimally though.
> |>
> |>I think what we need is a UI function that interprets a path or URL and
> |>produces a URL.
> |
> | That is a possibility. But having LocalTransport translate, and none of
> | the other transports translate also has that effect.
> 
> I would like a guarantee that the path separators are *either* '/'
> everywhere or os.sep everywhere.  I would also like a guarantee that
> either the path elements are URL-encoded everywhere, or never
> URL-encoded.  I see great value in a uniform interface, and no serious
> disadvantages.
> 

Other than having the logic about how to decode a string (whether it is
a url, or a path) in another place, which then needs to be used
everywhere we might get a path.

For example, I don't think '.' is a valid url. Which means if we want
urls everywhere, then 'Branch.open_containing()' shouldn't be passed '.'.

I just see more centralization by putting it into the transport
constructors, rather than having to catch every path on its way in.

Another possibility would be to have 'takes_args' have classes, just
like 'takes_options' does. And then the arg-parser could convert paths
into their internal forms. (Or we could just assume that args are always
paths, because that is probably everything right now. Some options
aren't, but I think all args are.)


> |>Yes, it's messy that some files produce bytestrings and some produce
> |>unicodestrings, but I don't know where you get "should".
> |
> |
> | Without using a codec.get_reader() adapter, all files produce
> | bytestreams (because all files *are* bytestreams).
> 
> Right, but with using a codec.get_reader adapter, they do.  So clearly
> the Python library intends for some file-like objects to emit strings,
> and others to emit unicode.  In fact, Branch.controlfile has been
> producing unicode-emitting files for many months.

You have a point.

> 
> | The fact that
> | StringIO.StringIO() can be unicode inside is usually thought of as a
> | misfeature in our codebase.
> 
> I'm not sure I'd agree with that.  You'd have to point me at the
> specific cases.
> 
> |>It does mean that in order to put unicode into a file, you have to have
> |>the whole thing in memory.  I'd prefer accepting an iterable (which a
> |>string is), but I can live with this.
> |
> |
> | But where did you get unicode without decoding it from somewhere else?
> 
> By processing data.  For example, by iterating over
> Branch.revision_history()
> 
> | And if you decoded it from somewhere else, why not skip the decode and
> | re-encode steps, and just pass the original file to put()?
> 
> Because it didn't come from a file, and because I think it's dirty to
> write bytes to a file and then read unicode from it.

But at that point isn't it a (group of) strings in memory?
But I suppose you might want to:

branch2.put_utf8(branch1.get_utf8('foo'))

And get_utf8 would return a file which returns utf8, and branch2 would
put it.
(Though the above could be branch2.put(branch1.get('foo')))

I'm okay with iterators, I just don't like the idea of iterating over a
string.

> 
> | put_utf8() is really only meant to be (I have some stuff in memory which
> | is already in unicode, and I want you to encode it, and put it into this
> | control file).
> 
> I think put_utf8 serves just as well for "I want to replace the contents
> of this file with some text..."
> 
> |>IterableFile isn't used elsewhere, so we can nuke it now, and revert it
> |>back when we need it.
> 
> | I was going to use it in my changeset branch as well (if that ever gets
> | anywhere). So I don't mind if it stays around unused for a little while.
> 
> Whatever you like.  You already have a copy of it there, I believe.
> 
> Aaron

I do. So we might actually have a file-id conflict. Though I think I got
the id from the changeset plugin. Hopefully Martin used the same id.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060119/6bfdbd5c/attachment.pgp 


More information about the bazaar mailing list