Filesystem paths

Thu Apr 27 02:18:01 BST 2006

On 27/04/2006, at 2:14 AM, Aaron Bentley wrote:

> Martin Pool wrote:
>> On 26/04/2006, at 1:53 PM, Aaron Bentley wrote:
>> +Transports work in URLs.  Take note that URLs are by definition only
>> +ASCII - the decision of how to encode a Unicode string into a  
>> URL  must be
>> +taken at a higher level, typically in the Store.
>
> This doesn't look accurate.  AFAIK, we do unicode-to-utf-8 conversions
> in the transport layer.  The store escaping thing is a separate issue.

OK, my text was a bit unclear.  There are two levels of escaping - id- 
 >filename, and then filename-url.  But both are done by Store._relpath:

         fileid = self._escape_file_id(fileid)
         path = prefix + fileid
         full_path = u'.'.join([path] + suffixes)
         return urlescape(full_path)

I think we should be clear that Transports accept *urls*, not url- 
like-things with Unicode in them.

>> +A similar edge case is that the url ``http://foo/sweet%2Fsour"  
>> contains
>> +one directory component whose name is "sweet/sour".  The escaped   
>> slash is
>> +not a directory separator.
>
> (although many, many pieces of software will treat it as one,  
> including
> conformant SFTP implementations)

Right - I think the SFTP committee were confused, or at least  
conflicted, about this.

> To some extent, this is Arch residue-- that system has a very clean
> distinction between archive-access methods (the PFS) and
> filesystem-access methods (the virtual-unix subsystem).
>
> The idea was that PFS supported only what was available on all  
> supported
> access methods-- read, write, list*, mkdir, rmdir, and not much else.
>
> Whereas the virtual-unix subsystem supported a richer command set that
> included, for example, stat and chmod.
>
> However, our Transports do support stat and chmod-- I guess the  
> question
> is whether we want to require this.  Would that preclude us from
> supporting additional access methods that we want to support?
>
> If the subset of functionality supported by all our transports is such
> that we can implement TreeTransform, the hashcache, etc. on top of  
> them,
> then perhaps I'm being too rigid.  There is desire to support remote
> access to working trees in some form, and this could be it.

Some Transports won't have that -- clearly they have different  
capabilities.  But some will - i think everything we need to do can  
be done over sftp, with a good server.  We already see that in  
Transports which can or can't list directories.  We could declare  
which ones have the minimum set to support a working tree, etc.

> Another argument is that Transports are unfamiliar to most developers,
> and so they introduce an unwarranted barrier to contribution.  On the
> other hand, it might make sense to use TreeTransform for our  
> operations,
> which would be a new API anyhow.

All WorkingTree operations?   That sounds good; I have a feeling it  
will make the behaviour more consistent as far as handling backups,  
conflicts with the working tree, etc.

> That
> In some ways, it might be desirable to permit non-unicode paths when
> working with working trees.  We require versioned files to have  
> unicode
> names, but I don't think we necessarily should require that  
> unversioned
> files have unicode pathnames.  At the moment, I expect that things  
> will
> explode in that situation, anyhow.

This is possibly a reason to work partially in URLs for the working  
tree - because they're just byte streams, we don't have to be able to  
decode them to Unicode to manipulate them.

-- 
Martin