Filesystem paths
Martin Pool
mbp at sourcefrog.net
Fri Apr 28 08:09:50 BST 2006
On 27/04/2006, at 11:22 PM, John Arbash Meinel wrote:
>> OK, so how about these rules for handling paths/urls from the user:
>>
>> If there is no URL scheme, they are filenames. Filenames are
>> assumed
>> to be encoded in the locale encoding. They can be decoded to
>> Unicode.
>>
>> To form the URL for a local file, we encode it into the
>> filesystemencoding and then escape that.
>
> I was encoding directly to utf-8. Does it make more sense to have the
> URL be filesystemencoded?
Actually I think you're right, utf-8 would be better -- if nothing
else it will be more intelligible if printed as a url.
> What characters are valid in filesystem-encoding that wouldn't be
> valid
> utf-8? I know there are byte-sequences, but if we have already decoded
> the path into Unicode, it seems that utf-8 is a safer internal format.
Exactly.
> I suppose there is an issue that the user would have to do the
> translation into unicode and back to utf-8 to be able to type the
> file://latin-1/with128-255chars/path
However, if they enter it as a filename rather than a URL, and if
their input locale is latin-1, they can just enter it directly.
>>> I'm not sure what Aaron is defining as "POSIX" interface. What would
>>> make TestCase.build_tree() a POSIX interface?
>>
>> I think he meant that it would use os.mkdir, file(), etc directly,
>> rather than going through the Transport
>
> The reason for the transport, is because then build_tree can
> actually do
> the build over sftp. (Which it does in a couple of instances).
>
> I think it gives our Transport stuff a decent workout.
It's a good thing to do -- i was just explaining what (i think) Aaron
meant by "posix".
> Also, because of how I had to do the URL changes, I think I made "bzr
> branch" able to create remote branches (as long as they are in a
> shared
> repo with no working trees).
That's good.
--
Martin
More information about the bazaar
mailing list