[MERGE] Use win32file.FindFiles instead of os.lstat when available
John Arbash Meinel
john at arbash-meinel.com
Thu Jun 26 19:28:45 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
| John Arbash Meinel wrote:
| | Attached is an optional optimizer when running on windows. I was working
| | in a moderately sized tree (~9k files+dirs), and I noticed that "bzr
| | status" was a bit slower than I would have expected. When I did
| | --lsprof, all the time was taken in "nt.lstat". I checked around and saw
| | that pywin32 exposes the FindFiles api, which will return the directory
| | entries and their stat info as you go.
| |
| | In my testing, it makes things faster. Somewhere between 2-4x for the
| | time spent it walkdirs. It also seems to do interesting things with
| | caching. Because after I've run status with this code a couple of times,
| | it makes the plain "bzr status" faster, though still not as fast.
| |
| | Anyway, abstractions are great, as this lets me slip in a more optimized
| | version for walking win32 paths. (And yes, this shouldn't be my primary
| | focus, but it was getting in my way and I wanted it to be better.)
| |
| | John
| | =:->
|
| BB:resubmit
|
| It seems there is a bug in my "time" handling here. As near as I can
| tell "os.lstat(path)" returns a time 5 hours earlier than that returned
| by "FindFiles". Presumably this is because of timezone issues.
|
| The time shown by Explorer is at -5, the time shown by FindFiles is not.
|
| I'll need to resolve this before this is usable. What is weird, is that
| on first access, it causes us to re-read all files to check the sha1.
| After that, the cache seems perfectly happy for both bzr.dev and my bzr.
| That is probably the weirdest part. If my code was wrong, I would expect
| them to always disagree, and keep changing the mtime stored in the
| dirstate file.
|
| Anyway I'm working on it, hopefully it will be easy to fix.
|
| John
| =:->
Hmm... I seem to be having some *really* weird stuff happening vi
win32file and PyTime. Specifically, if I do:
info = win32file.FindFilesW('path\\to\\file')[0]
ctime, atime, wtime = info[1:4]
This seems okay, and I now have 3 PyTime structures.
However if I do:
|>> int(ctime)
1210366208
|>> float(ctime)
39577.659814814811
As near as I can tell, that value is the "variant" time. Described here:
http://msdn.microsoft.com/en-us/library/ms221646(VS.85).aspx
The value is basically the time in "days".
Looking at the code, it seems PyTime (the object returned by FindFiles)
only stores this "variant" date internally.
I see this:
PyTime::PyTime(const FILETIME &t)
{
~ ob_type = &PyTimeType;
~ _Py_NewReference(this);
~ SYSTEMTIME st;
~ m_time = 0;
~ FileTimeToSystemTime(&t, &st);
~ (void)SystemTimeToVariantTime(&st, &m_time);
}
Which basically, converts the FileTime => SystemTime and then SystemTime
=> VariantTime. Which is a bit odd, considering all of the places that I
see it call VariantTimeToSystemTime before it tries to actually continue.
Just to compare quickly... Python's os.stat does:
WIN32_FIND_DATAW FileData;
hFindFile = FindFirstFileW(pszFile, &FileData);
Which it then assigns into a WIN32_FILE_ATTRIBUTE_DATA structure.
It passes this off to
FILE_TIME_to_time_t_nsec()
which pretends that FILETIME is actually a 64-bit integer (it is 2
32-bit integers, so it is actually reasonable), and then does:
*nsec_out = (int)(in % 10000000) * 100; /* FILETIME is in units of 100
nsec. */
/* XXX Win32 supports time stamps past 2038; we currently don't */
*time_out = Py_SAFE_DOWNCAST((in / 10000000) - secs_between_epochs,
__int64, int);
Where:
secs_between_epochs = 11644473600; /* Seconds between 1.1.1601 and
1.1.1970 */
So os.lstat() seems to be returning the raw time stored by the disk.
While win32file seems to be doing:
grab the raw WIN32_FIND_DATA, with its FILETIME structure 'ft'.
FileTimeToSystemTime => st
SystemTimeToVariantTime => m_time
VariantTimeToSystemTime => st
copy st => a struct tm object
struct tm tm = { 0 };
tm.tm_sec = st.wSecond;
tm.tm_min = st.wMinute;
tm.tm_hour = st.wHour;
tm.tm_mday = st.wDay;
tm.tm_mon = st.wMonth - 1;
tm.tm_year = st.wYear - 1900;
tm.tm_isdst = -1; /* have the library figure it out */
long result = (long)mktime(&tm);
Now, mktime() assumes that the structure is in local time.
|>> help(time.mktime)
mktime(...)
~ mktime(tuple) -> floating point number
~ Convert a time tuple in local time to seconds since the Epoch.
...
So in summary 'os.lstat("foo")' is returning the raw time given by the
disk, which seems to be in local time. And win32file is going through a
whole *lot* of convolutions to convert that time into the Epoch time.
Except, if I do:
|>> time.time()
1214504550.293
|>> open('filename.txt', 'wb').close()
|>> os.lstat('filename.txt').st_mtime
1214504561.2620001
|>> int(win32file.FindFiles('filename.txt')[0][3])
1214522561
I'm *very* tempted to say that pywin32 is in the wrong here. I'm
guessing it actually needs to be doing:
SystemTimeToTzSpecificLocalTime(NULL, &stUTC, &stLocal);
before it puts that time into the tm structure and calls mktime() on it.
Of course, that means I need to figure out how to make it all work.
Since if pywin32 fixes its code, then any fix I put into bzr will be
"broken"... :(
Mark, can you comment on what pywin32 is doing here?
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkhj390ACgkQJdeBCYSNAAMNQwCeK0nM1vhPFFZbep8+1LHl0EZA
9QYAn1QQ9LxT4HDTgwS+59vAK+891BBD
=UO4W
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list