[RFC] case sensitivity on Windows

Mark Hammond mhammond at skippinet.com.au
Sun Nov 2 23:52:17 GMT 2008


> That's still pretty nice progress.  I'll read the patch now, but as a
> tip, you should really update the subject line to include [merge] so
> that BB will catch it.  It's a good safety net to make sure it does
> get an answer one way or the other.

Thanks for the reply.  FWIW, I used [RFC] in preference to [MERGE] in the
subject line as I didn't consider it ready for merging.

> So this routine, as it currently stands, is not win32-specific (aside
> from the comment), but in fact is the kind of thing we'd like to also
> use on os x or linux case-insensitive filesystems.  So even if we're
> not handling those cases at present, I'd name it accordingly,
> cicp_canonical_relpath or something.

How do you see us deciding to use cicp_canonical_relpath at runtime?  Would
we still unconditionally use it for Windows, or would we attempt to
determine this dynamically for each tree object (ie, by sniffing the working
tree for this property?)

> In that context folding \ to / here is questionable, if we want to
> support such names on platforms that can actually handle them, but it
> could probably be fixed when that bug is actually fixed.

Yeah - I can probably just use normpath I'd guess.

> There is some risk that doing listdir() in here for every path will
> cause O(n) behaviour if we call this on every path in a tree.  I'd
> probably just add a comment for now; it may not be significant.  An
> easy optimization may be to check whether the name is in the list
> before folding case.
> 
> It looks like this will only be run on paths coming from the user at
> this time, and that's probably not many (especially on windows, with
> less use of things like xargs).
> 
> So I'd be happy to merge this in as at least improving support for some
> cases.

It has been mentioned here before that caching this per-directory might be
good for performance.  Would it be preferred to 'implicitly cache' these
results (eg, have cicp_canonical_relpath() keep a hidden, private cache), or
build caching into the signatures of the functions - eg, something like:

  entries = cicp_canonical_prime (dir_name)
  filename = cicp_canonical_relpath(filename, entries) # entries could even
be None to avoid caching?

?  The former looks much easier for consumers, but tricky to get right in
the face of threads etc....

> > (the above, but the last 'add' is implicit)
> > % touch foo
> > % bzr add foo
> > % mv foo Foo
> > % bzr add
> > => '' (ie, bzr should ignore the filesystem case for an existing
> entry)
> > % bzr diff/status
> > => '' (ie, nothing has changed)
> 
> This one is a bit interesting because it's saying that if the file was
> renamed externally, we don't consider that it was "semantically"
> renamed.  On Linux it would show as the newly added file being removed
> and a new one present, until you confirmed that it had moved.

Yes.  I think it is very important to decide on these semantics early on if
at all possible.

ie, is my example above "correct but interesting" or "interesting, but it's
not clear if it's correct yet"?  [FWIW, I'm not particularly fussed about
what semantics we decide, but think it worth drawing a line in the sand as
early as possible.]

> Also
> 
>   touch foo
>   bzr mkdir goo
>   bzr add
>   bzr mv FOO GOO
>   => renamed ./foo ->./goo/foo
>  (or arguably ./foo -> ./goo/FOO)

Right.  I think we seem to be moving towards a policy of 'for items already
in the inventory, ignore the case on the file-system' - so I'd suggest your
first response is the most appropriate.

> bzr mv has two almost separate ways of interpreting the command line
> 
>   bzr mv OLDNAME NEWNAME
>   bzr mv OLDNAME... NEWDIR
> 
> and you've mostly been talking about the former.  In the latter, I
> suppose we want to match case-insensitive against all the parameters,
> which is not so hard.

Yes - this is another limitation in the patch I sent - it blindly 'fixes'
every filename arg without considering the context.  'mv' is the obvious
example, but I fear others may also exist.  My 'problem' is that the list of
filenames is passed to tree_files(), which walks subdirs etc - so once
'tree_files()' has returned I've lost the ability to fixup the filenames as
specified by the user.  But tree_files() doesn't have the context to know
that the last item shouldn't be modified, for example.

In other words, I'm aware of that problem, but struggling to find a way to
modify cmd_mv to handle this.  Thoughts on this are very welcome and it is
basically exactly where I am stuck - one of my new tests fails for this
exact reason.

[snip other good examples]

> I don't think changing the case of existing files is the most
> important use case though; but it would be good to handle it sensibly.

Agreed - its not a very important use case, but do think its important to
decide how we will handle it, so we can ensure we have a consistent bigger
picture. 

> Keep putting up little patches like this and/or specific questions,
> we'll get them merged in, and seek feedback or bug reports from
> others.

Excellent, thanks.  At this stage I think we need to address the
command-line handling for 'mv' before the patch is ready to be merged - or
to put it another way, I believe all the new tests need to pass before its
ready even as a first step.  So all thoughts on that are very welcome.

Cheers,

Mark




More information about the bazaar mailing list