Filesystem paths

Wed Apr 26 08:36:28 BST 2006

On 26/04/2006, at 1:53 PM, Aaron Bentley wrote:

> I don't know Robert's reasons, but the reason I like the transport  
> layer
> being all-url is because some transports *must* be url-based, and all
> transports *can* be url-based.  It keeps the layer simple, promotes  
> code
> reuse, and all that good stuff.

Here's a patch for the developer documentation which tries to make it  
clear *why* they must be like this.  Is it clear/correct?  (Thanks to  
Robert for helping me get it straight.)

=== modified file 'a/HACKING'

--- a/HACKING	
+++ b/HACKING	
@@ -294,6 +294,48 @@
      indexes into the branch's revision history.
+Transport
+=========
+
+The ``Transport`` layer handles access to local or remote directories.
+Each Transport object acts like a logical connection to a particular
+directory, and it allows various operations on files within it.  You  
can
+*clone* a transport to get a new Transport connected to a  
subdirectory or
+parent directory.
+
+Transports are not used for access to the working tree.  At present
+working trees are always local and they are accessed through the  
regular
+Python file io mechanisms.
+
+filenames vs URLs
+-----------------
+
+Transports work in URLs.  Take note that URLs are by definition only
+ASCII - the decision of how to encode a Unicode string into a URL  
must be
+taken at a higher level, typically in the Store.
+
+The main reason for this is that it's not possible to safely  
roundtrip a
+URL into Unicode and then back into the same URL.  The URL standard
+gives a way to represent non-ASCII bytes in ASCII (as %-escapes), but
+doesn't say how those bytes represent non-ASCII characters.   
(They're not
+guaranteed to be UTF-8 -- that is common but doesn't happen  
everywhere.)
+
+For example if the user enters the url ``http://example/%e0``  
there's no
+way to tell whether that character represents "latin small letter a  
with
+grave" in iso-8859-1, or "latin small letter r with acute" in  
iso-8859-2
+or malformed UTF-8.  So we can't convert their URL to Unicode reliably.
+
+A similar edge case is that the url ``http://foo/sweet%2Fsour" contains
+one directory component whose name is "sweet/sour".  The escaped  
slash is
+not a directory separator.  If we try to convert URLs to regular  
Unicode
+paths this information will be lost.
+
+This implies that Transports must natively deal with URLs; for  
simplicity
+they *only* deal with URLs and conversion of other strings to URLs  
is done
+elsewhere.  Information they return, such as from ``list_dir``, is  
also in
+the form of URL components.
+
+
Merge/review process
====================

> What's frustrated me thus far about the transport layer that it  
> doesn't
> use urls everywhere, and it's not easy to use.  For example, I was
> recently fixing a bug to do with finding root directories in Windows
> paths.  Unfortunately, that leaves us with an OS-specific test case.
> Under Unix, we'll never test whether that functionality is right,  
> and we
> can't do so because all the path-manipulation functions require
> different assumptions.  If we were using URLs, we could have uniform
> testing, because URL manipulations are the same on every platform.

So LocalTransport.abspath shouldn't be calling osutils.abspath, but  
rather should be manipulating URL objects?  Then we can see that

   file:///c|/

has no "up"?

> Since users will rarely pass in URL for filesystem paths, we should  
> have
> a function that converts user paths unto URLs (if they're not  
> already).
>  Quite possibly get_transport should do that.

Yes.

> OTOH, I don't think it's appropriate to be using transports to access
> working trees, and since that's the bug you encountered, I suggest
> that's what we should fix-- build_tree should either be implemented in
> terms of POSIX, or it should translate paths to urls before using them
> with Transport.

Can you tell me more about why it's not appropriate?  Is it because  
Transports should focus on supporting just what is needed for control  
file access?

-- 
Martin