Filesystem paths

Martin Pool mbp at
Wed Apr 26 08:36:28 BST 2006

On 26/04/2006, at 1:53 PM, Aaron Bentley wrote:

> I don't know Robert's reasons, but the reason I like the transport  
> layer
> being all-url is because some transports *must* be url-based, and all
> transports *can* be url-based.  It keeps the layer simple, promotes  
> code
> reuse, and all that good stuff.

Here's a patch for the developer documentation which tries to make it  
clear *why* they must be like this.  Is it clear/correct?  (Thanks to  
Robert for helping me get it straight.)

=== modified file 'a/HACKING'
--- a/HACKING	
+++ b/HACKING	
@@ -294,6 +294,48 @@
      indexes into the branch's revision history.
+The ``Transport`` layer handles access to local or remote directories.
+Each Transport object acts like a logical connection to a particular
+directory, and it allows various operations on files within it.  You  
+*clone* a transport to get a new Transport connected to a  
subdirectory or
+parent directory.
+Transports are not used for access to the working tree.  At present
+working trees are always local and they are accessed through the  
+Python file io mechanisms.
+filenames vs URLs
+Transports work in URLs.  Take note that URLs are by definition only
+ASCII - the decision of how to encode a Unicode string into a URL  
must be
+taken at a higher level, typically in the Store.
+The main reason for this is that it's not possible to safely  
roundtrip a
+URL into Unicode and then back into the same URL.  The URL standard
+gives a way to represent non-ASCII bytes in ASCII (as %-escapes), but
+doesn't say how those bytes represent non-ASCII characters.   
(They're not
+guaranteed to be UTF-8 -- that is common but doesn't happen  
+For example if the user enters the url ``http://example/%e0``  
there's no
+way to tell whether that character represents "latin small letter a  
+grave" in iso-8859-1, or "latin small letter r with acute" in  
+or malformed UTF-8.  So we can't convert their URL to Unicode reliably.
+A similar edge case is that the url ``http://foo/sweet%2Fsour" contains
+one directory component whose name is "sweet/sour".  The escaped  
slash is
+not a directory separator.  If we try to convert URLs to regular  
+paths this information will be lost.
+This implies that Transports must natively deal with URLs; for  
+they *only* deal with URLs and conversion of other strings to URLs  
is done
+elsewhere.  Information they return, such as from ``list_dir``, is  
also in
+the form of URL components.
Merge/review process

> What's frustrated me thus far about the transport layer that it  
> doesn't
> use urls everywhere, and it's not easy to use.  For example, I was
> recently fixing a bug to do with finding root directories in Windows
> paths.  Unfortunately, that leaves us with an OS-specific test case.
> Under Unix, we'll never test whether that functionality is right,  
> and we
> can't do so because all the path-manipulation functions require
> different assumptions.  If we were using URLs, we could have uniform
> testing, because URL manipulations are the same on every platform.

So LocalTransport.abspath shouldn't be calling osutils.abspath, but  
rather should be manipulating URL objects?  Then we can see that


has no "up"?

> Since users will rarely pass in URL for filesystem paths, we should  
> have
> a function that converts user paths unto URLs (if they're not  
> already).
>  Quite possibly get_transport should do that.


> OTOH, I don't think it's appropriate to be using transports to access
> working trees, and since that's the bug you encountered, I suggest
> that's what we should fix-- build_tree should either be implemented in
> terms of POSIX, or it should translate paths to urls before using them
> with Transport.

Can you tell me more about why it's not appropriate?  Is it because  
Transports should focus on supporting just what is needed for control  
file access?


More information about the bazaar mailing list