[BUG] bzr locks itself when pushing from branch to a checkout in the same shared repo

Mon Feb 5 23:25:06 GMT 2007

Nicholas Allen wrote:
> I can reproduce it every time at the moment so am willing to help you
> track it down if needed. I checked the process id that it says locked it
> and it is definately the same one that is running the command...
> 
> Nick
> 
> 

It is pretty easy to reproduce:

% bzr init-repo --trees repo
% cd repo
% bzr init branch1
% cd branch1
% echo a > a
% bzr commit -m a
% cd ..
% bzr checkout branch1 checkout
% cd checkout
% echo b > b
% bzr commit -m b
#LOCKED

The problem is simply that the checkout and the thing it is bound to
don't realize that they both are holding a copy of the same repository,
so when it does (pseudo) "local.lock_write(); local.master.lock_write()"
the second lock fails because the first has already locked the repository.

A long time ago I argued for using singletons based on path for things
like Branch, Repository, etc. At the time we decided not to do it, in
case an application wanted to have the same branch open in different
places, and didn't want modifying one to do anything to the other. I
sort-of agree.

The other problem (IMO) is that a_branch.bzrdir.open_branch() does not
return a_branch, it returns a new branch pointing to the same location.
We actually make use of that in some of the test suite, as an "easy" way
to make sure that if someone else has locked a branch that we fail if we
try to lock it.

In my cvsps-import plugin, I use a repository shared across branches,
and I wanted to maintain a repository lock so that things stay cached in
memory (rather than having to re-read the index files all the time). To
do it, I hacked in and did:

 a_branch = Branch.open(path)
 assert a_branch.repository.bzrdir.transport.base == \
         repo.bzrdir.transport.base
 a_branch.repository = repo

It is a bit of an ugly hack, but it works, and is currently the only way
to get 2 branches to use the same repository object.

At one point this wasn't a huge problem on Linux because we used OS
level locks, and linux lets the same process lock a file 2 times.
Windows does not, which manifested itself in "bzr merge ." which was
part of our test suite at the time. (I think we now have an explict
check if other.base == this.base to handle something like this).

The reason it works with most other places where we are grabbing 2
branches is because one is open for read, and the other is open for
write. 'checkout' is the one place where we are opening 2 branches for
write operations.

So this is a fairly long analysis of the problem. What is the solution?
I'm not really sure. I like the idea of having a singleton for
Repository based on path, but there are some problems. One, it would
mean that we might upgrade from a read-lock to a write-lock, which
before now was forbidden. (Internally it is actually okay, because right
now a read-lock is a no-op that just enables caching).

Another possibility is for a checkout to say "if
self.master_branch.repository.bzrdir.transport.base ==
self.repository.bzrdir.transport.base:" do something special because the
repository is shared.

(By the way, we really should consider adding something like
Repository.base like we have for Branch.base and WorkingTree.basedir).

Another problem, though, is that because of symlinks, normalization,
alternate transports, etc it is possible to have the same repository
open over different connections such that the path does not look the
same. Do we care if someone does:

cd /path/repo
bzr checkout sftp://localhost/path/repo/bar foo
cd foo
bzr commit -m "foo"

It isn't as obviously a problem as when they are both local paths.

One possibility would be that if we see that the target repository is
locked, we look at the other repository lock that we hold, and see if
the unique id is the same.

But this would require a lot of communication between parts of the code
base. Like is_locked would have to actually raise the exception, and
have 'checkout' catch it, and compare with other information and try
again, etc. (A fairly large amount of API violations/peaking would need
to happen).

Repositories, branches, etc could all have unique identifiers generated
at creation time, which would then be independent of path. Except then
doing "cp -a X Y" would create 2 branches with the same id, but with
different locations.

So the simplest thing to do is to just special case the commit code such
that if a branch has a master branch it checks if the repositories are
at the same path, and if so, doesn't try to fetch. (Either it could
overwrite the target's branch.repository member, or do some other trickery).

I've known about this for a while, but I haven't thought of a really
acceptable solution to recommend anything to the list.

John
=:->

PS> I forget exactly who brought this up first, but I'm pretty sure his
name started with a W (it might be William Dodé) Something about
symlinks in the working tree, and trying to do nested trees in a repository.