[MERGE/RFC] case sensitivity on Windows
John Arbash Meinel
john at arbash-meinel.com
Fri Dec 19 18:09:55 GMT 2008
John Arbash Meinel has voted resubmit.
Status is now: Resubmit
Comment:
I think this is getting close to what we would want. I think one of the
biggest bits that caused "resubmit" was that you have a few functions
with similar names that don't quite do the same thing. Overall, I'm
really happy to see that it isn't as invasive as I thought it would be.
I would make "cicp_canonical_relpath" a private function, just because
it isn't something you want people to use directly. They should be using
the "osutils.canonical_relpath" function, and marking it
"_cicp_canonical_relpath" should help point people in that direction.
That said, we want to support CICP filesystem on other platforms, such
as Mac and vfat on Linux, so I would guess we actually want something
like "canonical_relpath" to be available everywhere, but only used when
WT.case_sensitive is True.
Which just as a start makes me wonder about the layering. Certainly we
can move case_sensitive up into MutableTree, and just have it default to
True (since Memory implementations would be case sensitive, etc.)
So I'm thinking we should probably just promote 'cicp_canonical_relpath'
to be plain 'canonical_relpath' and just try to be careful about when we
use it.
So in discussing canonical_relpath, why do you need to call abspath()
before you start iterating it? I would also guess that you could build
up just the relative portion, rather than starting with base and
building on top of it, and then trimming out the base as the last step
before you return. Also, I notice you don't special case an exact-match.
Specifically, you *always* iterate the results of listdir() (or
tree.iter_children(root_id)), rather than making an attempt to see if
there exists an entry with the exact name you are looking for. I have
the feeling that most times the paths come in exactly as they are on
disk (especially if someone takes advantage of TAB completion), and if
there is a directory with a lot of files, it could save a significant
amount of work.
These are minor tweaks/optimizations though, and I'll try to focus more
on design issues.
...
=== modified file 'bzrlib/builtins.py'
--- bzrlib/builtins.py 2008-11-15 20:37:36 +0000
+++ bzrlib/builtins.py 2008-11-17 12:16:33 +0000
@@ -210,6 +210,8 @@
no_pending = True
# A specific path within a tree was given.
elif relfile_list is not None:
+ # convert the names to canonical versions as stored in the
inventory
+ relfile_list = [tree.get_canonical_path(f) for f in
relfile_list]
no_pending = True
show_tree_status(tree, show_ids=show_ids,
specific_files=relfile_list,
revision=revision,
^- For example, this might become:
elif relfile_list is not None:
if not tree.case_sensitive:
relfile_list = [tree.get_canonical_path(f) for f in relfile_list]
I'm also trying to understand "tree.get_canonical_path".
"osutils.canonical_relpath" means to use a case-insensitive match
against the OS's filesystem.
I think the problem is using the term "canonical" to mean 2 different
things. Sometimes it means the path on disk, and sometimes it means to
use the path in the tree. Put another way,
tree.get_canonical_path(path) != osutils.canonical_relpath(tree.basedir,
path)
Which, IMO, means it isn't a *canonical* path.
I may be wrong on this. But if I do:
mv TODO Todo
python -c "from bzrlib import osutils, workingtree; wt =
workingtree.WorkingTree.open('.'); print
osutils.canonical_relpath(wt.basedir, 'todo'),
wt.get_canonical_path('todo')"
Todo TODO
There is also a small issue that we have "tree.relpath()" which matches
"osutils.relpath()", but you added "tree.get_canonical_path()" as the
corallary to "osutils.canonical_relpath()".
For clarity, I would probably change it a bit. And have:
osutils.case_insensitive_relpath()
and
tree.case_insensitive_relpath()
And in: tree.case_insensitive_relpath(), we might do something like:
if self.case_sensitive:
return self.relpath
...
I'm not 100% sure about that. If there are times where you may want to
do a case-insensitive match on a case-sensitive tree. Either way, I
would name them the same, and probably name them "XXX_relpath".
...
You changed 'bzr mv' to use get_canonical_relpath() for a lot of the
arguments. But there was one place:
into_existing = osutils.isdir(names_list[-1])
The very first access to the target directory. Before we pass it to
"isdir()" shouldn't it have gone through osutils.canonical_relpath() ?
So someone can type "bzr mv path to WrongCaSe"
and still have it find "wrongcase" which is an existing file on disk?
Then again, I guess if the filesystem is case-insensitive, then isdir()
will find it anyway.
I think it would be really good if we could bring in a
CaseInsensitiveMemoryTree, to make it easier to test these things on all
platforms. But I guess we don't abstract direct access to the filesystem
enough to make that actually work. (TreeTransform, for example, goes
direct to open() calls, etc.)
This code in "cmd_mv":
+ # Find the canonical version of the destination: In all
cases, the
+ # parent of the target must be in the inventory, so we
fetch the
+ # canonical version from there.
+ dest_parent =
tree.get_canonical_path(osutils.dirname(rel_names[1]))
+ dest_tail = osutils.basename(rel_names[1])
+ if after:
+ # If 'after' is specified, the tail must refer to a
file on disk,
+ # so fixup the tail using the file-system.
+ if dest_parent:
+ dest_parent_fq = osutils.pathjoin(tree.basedir,
dest_parent)
+ else:
+ # pathjoin with an empty tail adds a slash, which
breaks
+ # relpath :(
+ dest_parent_fq = tree.basedir
+
+ dest_tail = osutils.canonical_relpath(
+ dest_parent_fq,
+ osutils.pathjoin(dest_parent_fq,
dest_tail))
+ dest = osutils.pathjoin(dest_parent, dest_tail)
This seems like something that should be a separate-and-tested function
on Tree. Actually, unless I misunderstand the code in
get_canonical_relpath, I think this case is already handled.
Specifically, when the for loop is exhausted, you have:
else:
# got to the end of this directory and no entries
matched.
# Return what matched so far, plus the rest as
specified.
cur_path = osutils.pathjoin(cur_path, elt,
*list(bit_iter))
break
Though perhaps this doesn't work for the case of:
mkdir Foo; touch file; Foo/BAR
bzr add file Foo/BAR
bzr commit
bzr mv File foo/bar
In this case, your code would translate "File" => "file", and then it
would translate "foo/bar" into "Foo/bar", rather than "Foo/BAR".
I could be wrong, but I have the feeling that the plain implementation
of "get_canonical_relpath" is correct here. If the tip already exists
(in the filesystem, or in the tree) then it should be used, instead of
using the exact name supplied by the user, otherwise the code is already
designed to return the path unchanged.
...
in cmd_remove:
tree, file_list = tree_files(file_list)
if file_list is not None:
- file_list = [f for f in file_list]
+ file_list = [tree.get_canonical_path(f) for f in file_list]
I think this is correct in this case. But given what "tree_files()"
really means, it feels to me that we likely should be doing it all the
time.
So instead of updating every command to wrap the file_list with a
"get_canonical_relpath()" call, we should instead change the
implementation of "safe_relpath_files()"
So instead of doing:
new_list.append(tree.relpath(osutils.dereference_path(filename)))
We do:
new_list.append(tree.get_canonical_path(osutils.dereference_path(filename)))
I guess the only reason not to do that is because you may be talking
about the path-on-disk rather than the path-in-the-tree, which can
certainly give different results, and why all of this is really hard to
get right, because you never now for a string whether it is the path on
disk, or the path in memory, etc.
...
=== modified file 'bzrlib/mutabletree.py'
--- bzrlib/mutabletree.py 2008-10-03 23:42:56 +0000
+++ bzrlib/mutabletree.py 2008-11-17 06:38:07 +0000
@@ -348,7 +348,7 @@
# relative : it's cheaper to make a tree relative path an
abspath
# than to convert an abspath to tree relative.
for filepath in file_list:
- rf = _FastPath(self.relpath(filepath))
+ rf = _FastPath(self.canonical_relpath(filepath))
# validate user parameters. Our recursive code avoids
adding new files
# that need such validation
if self.is_control_filename(rf.raw_path):
^- At first, I was almost positive that this is wrong. I thought it
either needed to be "osutils.canonical_relpath" or
"self.get_canonical_path()" because there is no Tree.canonical_relpath()
function that I could find.
However, it turns out that you implemented "canonical_relpath" on
WorkingTree (though not on MutableTree or on Tree), which was a direct
thunk to osutils.canonical_relpath(self.basedir, path)
Again, it stresses the need to be very clear about when you are asking
for the case-insensitive path on disk, and when you are asking for the
case-insensitive path in the tree (inventory). I originally was pushing
to have the functions named the same, and calling the "Tree.XXX" would
give you the path in the tree, and calling "osutils.XXX" would give you
the path on disk. I see that you added a "Tree.XXX" and a "Tree.YYY"
which gives you one or the other.
I see a few possibilities
a) Clearer and obviously separate function names, such as
"Tree.filesystem_relpath()", or "Tree.cicp_filesystem_relpath()", versus
"Tree.cicp_relpath()". Though I sort of want it to mean "give me the
case-insensitive path if the fs is case-insensitive, otherwise give the
case-sensitive path". And putting cicp seems a bit too strong that we
would always give the case-insensitive form.
b) An argument to the function: "Tree.cicp_relpath(path,
filesystem=True)"
c) Not using a function on Tree when you want the path from disk. (ie,
always spelling out "osutils.canonical_relpath(tree.basedir, path)" when
you want a path from disk)
...
+class _CaseInsCasePresFilenameFeature(Feature):
+ """Is the file-system case insensitive, but case-preserving?"""
+
+ def _probe(self):
+ fileno, name = tempfile.mkstemp(prefix='MixedCase')
+ try:
+ # first check truly case-preserving for created files, then
check
+ # case insensitive when opening existing files.
+ name = osutils.normpath(name)
+ base, rel = osutils.split(name)
+ found_rel = osutils.canonical_relpath(base, name)
+ return (found_rel==rel
+ and os.path.isfile(name.upper())
+ and os.path.isfile(name.lower()))
+ finally:
+ os.close(fileno)
+ os.remove(name)
+
+CaseInsCasePresFilenameFeature = _CaseInsCasePresFilenameFeature()
+
+
^- I think you also want to define:
def feature_name(self):
return "case-insensitive case-preserving filesystem"
=== added file 'bzrlib/tests/blackbox/test_filesystem_cicp.py'
--- bzrlib/tests/blackbox/test_filesystem_cicp.py 1970-01-01
00:00:00 +0000
+++ bzrlib/tests/blackbox/test_filesystem_cicp.py 2008-11-17
12:15:58 +0000
...
+"""Tests variations of case-insensitive and case-preserving
file-systems."""
+
+import os
+
+from bzrlib.tests.blackbox import ExternalBase
+from bzrlib.tests import CaseInsCasePresFilenameFeature, KnownFailure
+
+class TestIncorrectUserCase(ExternalBase):
^- Two blank lines between imports and a class definition.
This is really a time when it would be nice to have things abstracted
such that we can use a CaseInsensitiveWorkingTree.... It probably isn't
feasible in the short-term, though.
Also, seeing the number of times where you need to do something like:
+ file_list = [tree.get_canonical_path(f) for f in file_list]
I wonder if we shouldn't have a specific function for this. Something
like:
file_list = tree.canonical_relpaths(file_list)
The idea is that could be a very good way to implement appropriate
caching. The cache would exist for only a single call, but could be
preserved until all entries had been handled.
It could also mean that multiple entries could be resolved 'in
parallel', rather than determining that all of the files in the "Foo"
directory were really in the "foo" directory.
I bring it up, because we may want to *only* provide the multi-version.
Certainly you can always do:
path = tree.canonical_relpaths([path])[0]
And then we only need to provide 1 function, rather than 2.
Again, this isn't something that you have to do now. Just an idea about
how we might want to structure it.
I didn't read the doc closely, though I'm happy to see documentation
being added.
For details, see:
http://bundlebuggy.aaronbentley.com/project/bzr/request/%3C4934FA08.5010105%40gmail.com%3E
Project: Bazaar
More information about the bazaar
mailing list