RFC: win32 fstat inconsistencies

Martin Pool mbp at sourcefrog.net
Fri Mar 13 00:39:58 GMT 2009


2009/3/13 John Arbash Meinel <john at arbash-meinel.com>:
> So it seems Windows does something weird wrt stat returns. Specifically,
> os.lstat gives different results from os.fstat:
>
>  f = open('bzr', 'rb')
>  fst = os.fstat(f.fileno())
>  lst = os.lstat('bzr')
>  lst != fst
>
> The big difference that I can see is that os.fstat() has a value in
> "st.st_ino", while os.lstat() returns 0 for st_ino.
>
> I was trying to decide what to do with that. At the moment it is causing
> the test suite to fail in
> "workingtree_implementations.get_file_with_stat". My first feeling was
> to change the assertEqualStat() function, to skip the st_ino field for
> windows, but I realized that this might actually be effecting the
> performance of dirstate-after-commit. Namely, we use st_ino in the
> hash-cache to detect if a file has been changed. When we stat the file,
> it will be giving us a different result than the one we would have
> gotten from os.fstat() during commit.
>
> I don't have a great answer yet. My first thought about fixing it was to
> have WT.get_file_with_stat() set the st_ino back to 0 on win32, since we
> know that it won't match the future lstat() call. However, these objects
> are an "nt_stat" result builtin object. Which means that you cannot
> mutate them. And it doesn't seem quite right to wrap them in yet-another
> class...
>
> So I could:
>
> 1) Create a dummy stat result to wrap the os.fstat() return which always
> returns 0 for st_ino.
> 2) Change the assertEqualStat() function to allow the st_ino to differ,
> and then change the dirstate code to always ignore the st_ino field on
> windows.

Wow, that is a bit weird.  It would be nice if fixing it turns out to
make workingtree access faster though.

Changing the test to be looser seems wrong, because it would have
hidden this bug.

We do try to have a concept of 'fingerprint' for the cache validator
as distinct from the semantic stat content, so we should probably make
sure to use that name if it's something from which you're not meant to
examine the contents.

I would probably make a new stat object with the inode masked off, or
a tuple if the callers will accept that.  (The Python builtin stats
act like tuple.)  Because this is hit a lot we don't want to do too
much allocation but I think this is most feasible, unless/until we can
do the stat in our own C code.

-- 
Martin <http://launchpad.net/~mbp/>



More information about the bazaar mailing list