Filesystem paths

Martin Pool mbp at sourcefrog.net
Fri Apr 28 08:09:50 BST 2006


On 27/04/2006, at 11:22 PM, John Arbash Meinel wrote:

>> OK, so how about these rules for handling paths/urls from the user:
>>
>>  If there is no URL scheme, they are filenames.  Filenames are  
>> assumed
>> to be encoded in the locale encoding.  They can be decoded to  
>> Unicode.
>>
>>  To form the URL for a local file, we encode it into the
>> filesystemencoding and then escape that.
>
> I was encoding directly to utf-8. Does it make more sense to have the
> URL be filesystemencoded?

Actually I think you're right, utf-8 would be better -- if nothing  
else it will be more intelligible if printed as a url.

> What characters are valid in filesystem-encoding that wouldn't be  
> valid
> utf-8? I know there are byte-sequences, but if we have already decoded
> the path into Unicode, it seems that utf-8 is a safer internal format.

Exactly.

> I suppose there is an issue that the user would have to do the
> translation into unicode and back to utf-8 to be able to type the
> file://latin-1/with128-255chars/path

However, if they enter it as a filename rather than a URL, and if  
their input locale is latin-1, they can just enter it directly.

>>> I'm not sure what Aaron is defining as "POSIX" interface. What would
>>> make TestCase.build_tree() a POSIX interface?
>>
>> I think he meant that it would use os.mkdir, file(), etc directly,
>> rather than going through the Transport
>
> The reason for the transport, is because then build_tree can  
> actually do
>  the build over sftp. (Which it does in a couple of instances).
>
> I think it gives our Transport stuff a decent workout.

It's a good thing to do -- i was just explaining what (i think) Aaron  
meant by "posix".

> Also, because of how I had to do the URL changes, I think I made "bzr
> branch" able to create remote branches (as long as they are in a  
> shared
> repo with no working trees).

That's good.

-- 
Martin







More information about the bazaar mailing list