Shelf bug ?

Sat Nov 8 23:28:26 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

马旋（SuperMMX） wrote:
> 
> Hi, Bazaar guys:
> 
> I am using latest bzr.dev, but it seems that shelve doesn't work correctly.
> I'm running bzr on Windows XP.
> 
> The error is:
> 
> bzr: ERROR: Could not acquire lock "$MY_PATH/.bzr/checkout/dirstate

So I investigated this, and I get the same error here.

The cause is that the Shelver code doesn't take out an explicit lock on
the working tree for the lifetime of the operation. Paraphrased, it does:

source = WorkingTree.open_containing()
target = source.basis_tree()

In this case 'source' ends up being a WorkingTree4, and target is a
DirStateRevisionTree pointing at the source's dirstate file.

The *problem* is that whenever you take out a new lock on WT4, it
re-opens its Dirstate, creating a new reference to it.

What happens is that when we call

WT.basis_tree() it calls self.revision_tree() which takes out an
implicit needs_read_lock()". Which reads the dirstate file, and passes
that object to the DirstateRevisionTree.

However, the lock goes away as soon as the function returns. So that
later on, after we take out the write lock on the 'source', which
re-reads the dirstate and creates a new Dirstate object, and then locks
it. When we later come along to take a read-lock on the target, it is
pointing to the same file via a different in-memory object.

We don't notice the problem on Linux, because fcntl locks are not
exclusive *within* a process, but Win32 os locks *are* exclusive within
a process.

As an aside, another bad result is that the WT lock isn't cleaned up
properly. After trying to do "bzr shelve" you end up with a write-locked
tree, and you have to use "bzr break-lock" to get it to work again.

So what is the solution...

1) Write lock the source tree as soon as we acquire it, and have
something inside Shelver know when to unlock it. The big problem is that
this would best be done in a deconstructor, which is ugly. I can do it
via a deconstructor proxy, but I don't have a better solution.

2) Change DSRT to access its dirstate via the tree itself, rather than
the Dirstate object directly.

(2) is a bit easier to implement, and would also tend to hide problems
anywhere else where we have the result of basis_tree()/revision_tree()
lifetime outside of the lifetime of the WT lock. However, I feel like
this is supressing bad code. Most notably, by doing so we cause the
dirstate file to be read 2 times, rather than just once. (Actually, N
times, depending on however many functions are called on the WT that
take an implicit lock.)

Personally, I feel we should move away from the implicit locks, and
instead have WT.basis_tree()/revision_tree() fail if the WT is not
locked. That would make it clearer that callers are expected to be done
with the basis/revision tree before they unlock the object that created it.

3) If we don't want a deconstructor, then we should change how
Shelver.from_args works, and probably require some sort of "finally()"
to be called on the Shelver object. So it would be used like:

  shelver = Shelver.from_args()
  try:
    shelver.run()
  finally:
    shelver.finished()

One problem with this, is that people calling Shelver() directly may not
realize that Shelver is then taking over the lifetimes of the objects.
Alternatively, Shelver.from_args() could return multiple objects, one of
which is the shelver, and the other is something that manages the
lock-state of the trees involved.

I *do* think it would be good to change _get_one_revision_tree() to
check the lock status of the tree passed in, to help find other commands
that are secretly broken. Though what I'd really prefer is to change
WT4.revision_tree() to get rid of the implicit lock, and turn it into
raising ObjectNotLocked(). If we want to maintain api compatibility, we
could get rid of needs_read_lock(), and instead check the lock state,
issue a DeprecationWarning, and then go ahead and add an explicit
lock_read/unlock call.

Attached is a patch which creates an Unlocker class, and then ties the
lifetime of the attached trees to the lifetime of the Shelver.

1) 'bzr shelve' does work from the command line, because it goes through
Shelver.from_args()

2) The Shelver tests still don't pass, because they use
"tree.basis_tree()" outside of a lock. Which means they are still create
this double-reference problems as part of the test suite.
I tried to work around this in the constructor, but by the time we *get*
to the constructor, the problem is already done. We've already called
WT4.basis_tree() without maintaining a lock, which causes use to create
2 DirState objects.

3) The colordiff code is broken for windows. Basically, the windows
shell doesn't respond to the terminal codes it is getting. It is
probably possible if I was running under cygwin, but I'm running the
win32 native version. So instead of getting colors, I get:

←[0m←[0;37m     shelf,
←[0m←[0;37m     textfile,
←[0m←[0;37m     trace,
←[0m←[0;34m+    tree as _mod_tree,

I'm guessing the easiest thing to do is just skip colordiff support if
sys.platform == 'win32'. I was hoping that 'colordiff' would be smart
enough to recognize whether the TERM was capable of coloring or not, but
I wasn't able to trick it into not trying.

I'll look into better fixes, but this at least allows "bzr shelve" to
work on Windows.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkWIJkACgkQJdeBCYSNAAP8nwCfQu7iUE8tTSom2PQByVQ0GQU1
0Z0An0QHoOgQAeWNW99Y8I1KrKdmRi84
=K/q5
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: shelf_locking.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20081108/d33f5632/attachment-0001.diff