[RFC] case sensitivity on Windows
Paul Moore
p.f.moore at gmail.com
Fri Oct 31 16:07:30 GMT 2008
2008/10/31 John Arbash Meinel <john at arbash-meinel.com>:
>>> Just a brief note here - I did some similar code for Mercurial, and I
>>> found that actually, using os.listdir plus cacheing the results (so
>>> that I didn't call os.listdir more than once per directory) was
>>> actually faster than using win32 FindFile calls.
>>
>> I'm very surprised about this moment, because internally os.listdir uses
>> FindFile Win32 API. Perhaps this overhead added by pywin32 wrapper?
>
>
> I think it is more about "caching the results" and working "one
> directory at a time". I would guess that if he was directly calling
> FindFile, then he would be calling it for every path separately.
>
> And so the overhead is in making 50 calls for 50 files in one directory,
> versus 1 call for 50 files.
Precisely. If I recall correctly, there are some corner cases in
FindFiles behaviour that makes it difficult to use for this purpose
(it falls back to a wildcard search rather than an exact match in some
odd cases, and you have to protect against that). Of course, you can
use FindFiles on directories and cache the results, but that's just an
OS-specific version of the equivalent listdir code.
I apologise for the vagueness of this - I did some fairly extensive
experiments at the time, but threw the code away. (Goes away to
rummage) Ah, yes. The problem is that if you use FindFiles on a
directory name, it lists the directory, making it annoyingly awkward
to canonicalise the case of a directory name (unless you fall back to
using FindFiles just like os.listdir). Not impossible, but fiddly to
get right, and on my testing, no benefit over os.listdir plus caching
once you got past a pretty small number of filenames to process (less
than 10, certainly).
Paul
More information about the bazaar
mailing list