Help trying to run selftest on EC2

John Arbash Meinel john at arbash-meinel.com
Wed Dec 2 17:22:46 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Now that I've gotten the selftest (mostly?) running clean on Windows,
I've been trying to get the tests running on our EC2 instance.

Unfortunately, I get random failures that I was hoping someone could
help me sort out.

Almost all of them seem to fall down to a failure in "os.rename()"
giving a permission error. I'm getting this both on .pack files, and on
directories. (The common case on directories is a failure to rename a
held lock out of the way, meaning we don't unlock, which means the next
operation fails to obtain a lock.)

I've tried tracking through the code, and all the cases I can see
certainly look like we are effectively doing stuff like:

 file.close()
 os.rename(path, new_path)


So we have a clear "close" before we try to rename.

The only thing I can think of, is that the OS is changing the '.close()'
into an asynchronous call. Well, that it is releasing the path lock in
an asynchronous fashion, rather than blocking. Which means that the lock
may-or-may-not be held when we come to .rename().

I may be missing something, as it seems to also involve using a loopback
of some sort. (the sftp tests fail the most, but we have some others
that seem to be failing.)

Looking at the sftp code, it seems fairly clear that calling
SFTPFile.close() does trigger a CMD_CLOSE call, and that traces down
into actually calling .close() on the file object. It looks like the
default is to do it synchronously, though if done via __del__ it is done
asynchronously.

(The other time that is clear is in a 'hooked://' transport that also is
operation via a loopback. I don't know if this a wrapper around sftp or
not.)

The failure is sporadic. Usually the same test fails. But if I run the
test directly it passes. If I run "bzr selftest -s bb.test_branch" I can
get anywhere from 0-3 failures.

Anyone have any ideas about how to clear this up? Should I hack up an
os.rename() that detects PermissionDenied, calls time.sleep() for a half
second, and then tries again?

John
=:->




-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksWomYACgkQJdeBCYSNAAOaLgCgy49t4XkkdLtkeyVmvzrise9d
DXsAoL4NNKRQCa+DVPmXpD60j5xvEOlH
=bBQe
-----END PGP SIGNATURE-----



More information about the bazaar mailing list