Filesystem paths
Martin Pool
mbp at sourcefrog.net
Wed Apr 26 08:36:28 BST 2006
On 26/04/2006, at 1:53 PM, Aaron Bentley wrote:
> I don't know Robert's reasons, but the reason I like the transport
> layer
> being all-url is because some transports *must* be url-based, and all
> transports *can* be url-based. It keeps the layer simple, promotes
> code
> reuse, and all that good stuff.
Here's a patch for the developer documentation which tries to make it
clear *why* they must be like this. Is it clear/correct? (Thanks to
Robert for helping me get it straight.)
=== modified file 'a/HACKING'
--- a/HACKING
+++ b/HACKING
@@ -294,6 +294,48 @@
indexes into the branch's revision history.
+Transport
+=========
+
+The ``Transport`` layer handles access to local or remote directories.
+Each Transport object acts like a logical connection to a particular
+directory, and it allows various operations on files within it. You
can
+*clone* a transport to get a new Transport connected to a
subdirectory or
+parent directory.
+
+Transports are not used for access to the working tree. At present
+working trees are always local and they are accessed through the
regular
+Python file io mechanisms.
+
+filenames vs URLs
+-----------------
+
+Transports work in URLs. Take note that URLs are by definition only
+ASCII - the decision of how to encode a Unicode string into a URL
must be
+taken at a higher level, typically in the Store.
+
+The main reason for this is that it's not possible to safely
roundtrip a
+URL into Unicode and then back into the same URL. The URL standard
+gives a way to represent non-ASCII bytes in ASCII (as %-escapes), but
+doesn't say how those bytes represent non-ASCII characters.
(They're not
+guaranteed to be UTF-8 -- that is common but doesn't happen
everywhere.)
+
+For example if the user enters the url ``http://example/%e0``
there's no
+way to tell whether that character represents "latin small letter a
with
+grave" in iso-8859-1, or "latin small letter r with acute" in
iso-8859-2
+or malformed UTF-8. So we can't convert their URL to Unicode reliably.
+
+A similar edge case is that the url ``http://foo/sweet%2Fsour" contains
+one directory component whose name is "sweet/sour". The escaped
slash is
+not a directory separator. If we try to convert URLs to regular
Unicode
+paths this information will be lost.
+
+This implies that Transports must natively deal with URLs; for
simplicity
+they *only* deal with URLs and conversion of other strings to URLs
is done
+elsewhere. Information they return, such as from ``list_dir``, is
also in
+the form of URL components.
+
+
Merge/review process
====================
> What's frustrated me thus far about the transport layer that it
> doesn't
> use urls everywhere, and it's not easy to use. For example, I was
> recently fixing a bug to do with finding root directories in Windows
> paths. Unfortunately, that leaves us with an OS-specific test case.
> Under Unix, we'll never test whether that functionality is right,
> and we
> can't do so because all the path-manipulation functions require
> different assumptions. If we were using URLs, we could have uniform
> testing, because URL manipulations are the same on every platform.
So LocalTransport.abspath shouldn't be calling osutils.abspath, but
rather should be manipulating URL objects? Then we can see that
file:///c|/
has no "up"?
> Since users will rarely pass in URL for filesystem paths, we should
> have
> a function that converts user paths unto URLs (if they're not
> already).
> Quite possibly get_transport should do that.
Yes.
> OTOH, I don't think it's appropriate to be using transports to access
> working trees, and since that's the bug you encountered, I suggest
> that's what we should fix-- build_tree should either be implemented in
> terms of POSIX, or it should translate paths to urls before using them
> with Transport.
Can you tell me more about why it's not appropriate? Is it because
Transports should focus on supporting just what is needed for control
file access?
--
Martin
More information about the bazaar
mailing list