Filesystem paths
John Arbash Meinel
john at arbash-meinel.com
Wed Apr 26 15:02:32 BST 2006
Martin Pool wrote:
> On 26/04/2006, at 1:53 PM, Aaron Bentley wrote:
>
>> I don't know Robert's reasons, but the reason I like the transport layer
>> being all-url is because some transports *must* be url-based, and all
>> transports *can* be url-based. It keeps the layer simple, promotes code
>> reuse, and all that good stuff.
>
> Here's a patch for the developer documentation which tries to make it
> clear *why* they must be like this. Is it clear/correct? (Thanks to
> Robert for helping me get it straight.)
>
...
> +The main reason for this is that it's not possible to safely roundtrip a
> +URL into Unicode and then back into the same URL. The URL standard
> +gives a way to represent non-ASCII bytes in ASCII (as %-escapes), but
> +doesn't say how those bytes represent non-ASCII characters. (They're not
> +guaranteed to be UTF-8 -- that is common but doesn't happen everywhere.)
> +
So the non-safety of round trips is probably enough for me to accept
that we need to use URLs.
I'm still concerned about a couple of other edge cases, though.
Specifically, if I manually type in a path, it is going to be a string
encoded by the local encoding. This includes both local paths, and sftp
paths (in my mind), and *maybe* http paths.
But more likely is that I'm going to cut & paste an http path (either
from my browser or as a link in an email). And these will be actual URLs.
The sftp case could go either way. I believe the sftp spec says that
paths are byte blobs, just like Unix, and so doesn't enforce anything
except a lack of null characters.
I also don't know how ftp works. I have the feeling that it is an old
enough spec it could be very implementation dependent.
In my one test of an ftp server, I got back:
>>> f.list_dir('jfmeinel/')
['\xd8\xac\xd9\x88\xd8\xac\xd9\x8a.txt']
Which is a utf-8 string.
I also tried the sftp transport, and it returned this:
>>> s.list_dir('jfmeinel/')
[u'\u062c\u0648\u062c\u0648.txt']
So right now, sftp returns a unicode string, while ftp returns the raw
byte stream.
Now, ftp fails to actually get that file, because it tries to mutter(%s)
with a non-ascii string.
Anyway, my concern is that users are going to enter strings which may be
urls, or may be unicode strings. I really don't think we want to require
them to always enter urls, because it is a real pain to have to escape
them when you are just referring to a local file.
It might be okay to require them to translate the URL into a unicode
string, but as earlier stated, that is not defined to actually work.
My feeling is that we should treat everything except http as a unicode
url, and an http:// string as a real url.
Alternatively, we treat plain paths as unicode, and anything that starts
with foo:// needs to be a real url. I suppose that is the most
consistent, but it means I can't do:
ssh host
ls
copy + paste => sftp://host/path
Honestly, unicode urls aren't a big deal for *me*. I don't have any need
right now. I will probably need to encode unicode filenames, but I can
post those somewhere with only ascii characters.
Any thoughts?
...
> So LocalTransport.abspath shouldn't be calling osutils.abspath, but
> rather should be manipulating URL objects? Then we can see that
>
> file:///c|/
>
> has no "up"?
I'm not sure what Aaron meant by local semantics for non url paths. As
long as we only allow '/' as the path delimiter, I don't see how path
handling is all that different.
I know there is an issue with ':'. And some questions about 'isabs'
requiring a beginning slash versus requiring a beginning drive letter +
':/'. Though there is the builtin 'os.path.isabs' which would handle
that for us.
>
>> Since users will rarely pass in URL for filesystem paths, we should have
>> a function that converts user paths unto URLs (if they're not already).
>> Quite possibly get_transport should do that.
>
> Yes.
I would tend to agree. Since I think the passed in paths would depend on
the transport to decide whether they were URLs or unicode paths.
>
>> OTOH, I don't think it's appropriate to be using transports to access
>> working trees, and since that's the bug you encountered, I suggest
>> that's what we should fix-- build_tree should either be implemented in
>> terms of POSIX, or it should translate paths to urls before using them
>> with Transport.
>
> Can you tell me more about why it's not appropriate? Is it because
> Transports should focus on supporting just what is needed for control
> file access?
>
> --Martin
I'm not sure what Aaron is defining as "POSIX" interface. What would
make TestCase.build_tree() a POSIX interface?
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060426/4d0f7702/attachment.pgp
More information about the bazaar
mailing list