What should happen if bzr is unable to complete a task?

Martin Pool mbp at sourcefrog.net
Mon Jun 22 05:09:59 BST 2009


2009/6/19 Maritza Mendez <martitzam at gmail.com>:
>
> I've discussed this briefly with Alexander in the special context of 'bzr
> qbranch' in Bazaar Explorer, but there is a more general question I am
> trying to express.  Please help me find the best way to ask this question.
>
> I have a badly behaving set of computers at work.  Actually, I think the
> problem is with the subnet and our IT guys are working on it.  But meanwhile
> it has made me think about a more general question.  What we see is that a
> 'bzr branch' operation can hang if the parent location is outside our
> firewall.  Inside, no problem.  At some point, the progress bar freezes and
> nothing happens.  I've waited at least ten minutes in some cases.  Nothing.
> In each case, we see that the working directory has been created with the
> expected structure but no working tree (obviously) and there is a .fetch
> file.  We're able to determine that the bzr process is still holding a
> handle to the fetch file, but the file has stopped growing.
>
> No problem: kill bzr and start over.  The fact that this happens more often
> than not for us is our problem.  We have some network problem that
> mysteriously affects bzr and (as far as we know) nothing else.   We're
> working on that.
>
> But what about unattended scripts?  And what about the bzr-explorer and
> future tools which invoke bzr (rather than calling bzrlib)?  How should they
> recover if bzr stalls because it can't get what it needs?
>
> Is it already possible for me to set a timeout?  And is there a case to be
> made for transport operations to have timeouts to allow graceful recovery
> when a resource (like a network) goes away?  Maybe this already exists?
> Note that because the same version of bzr (1.15-1) works everyplace else
> we've tried -- including at home! -- I am assuming the problem is with our
> network.  I do not know if this assumption is true.

We don't normally do a timeout at the moment.  I think bzr should
timeout network operations after some
user-configurable-with-sensible-default delay, say 1 minute.  (Some
transports may have one at a lower level inside eg the http library or
in the ssh subprocess, but this is not consistent or ideal.)
<https://bugs.edge.launchpad.net/bzr/+bug/390485> and
<https://bugs.edge.launchpad.net/bzr/+bug/390486>

In theory the OS would eventually notice that the remote host is not
reachable or no longer responding, but this doesn't always work in
practice on all networks, and the default timeout is very long (maybe
2 hours?)

-- 
Martin <http://launchpad.net/~mbp/>



More information about the bazaar mailing list