[RFC] Add run_cleanup, do_with_cleanup

Thu Sep 24 05:44:56 BST 2009

Martin Pool wrote:
> 2009/9/24 Andrew Bennetts <andrew.bennetts at canonical.com>:
[...]
> 
> (I haven't read the diff yet, but we did talk about it ...)
> 
> I was originally suggesting that we should handle this by putting all
> cleanup-type code through a single bottleneck.  I think you're on to
> something in saying (if you are) that we should instead put it through
> common policies for different types of cleanup.

Yes, I think that's what I'm getting at (the picture has been emerging
pretty gradually!).

> My sense is that this is more about the type of cleanup than the place
> it's being called from, so I'd expect that normally we will be
> changing the functions called at cleanup, not the code with the
> try/finally block.  Of course if there is code with multiply nested
> try/finally blocks and we can make it cleaner by changing to a
> higher-order-function that runs them in order, that's great.

Right, that's a good example where a h-o-f can make the code cleaner, not
just more robust.

> I think we can distinguish different types of severity of failure.
> For instance, if we fail to unlock a branch, that's worth telling the
> user about, but it's bad to give them such a large or severe message
> that it obscures their view of what actually went wrong, as often
> seems to happen in bugs.
> 
> One blue-sky option here would be to indicate success or failure by
> return code: unlock (say) returns True if it actually unlocked, False
> if it failed.  Then code that really cares can see, but the default
> will be not to interrupt processing.  This gives you a way to have
> failures that are 'slightly interesting'.

I don't particularly like checking return values.  I think it might be nice
to be able to say "foo.unlock(on_error='warn')", or
on_error='log'/'ignore'/'raise'.  I'm not sure which we'd like to have as
the default, and it's also questionable about whether we really want to
burden all unlock implementations with this, but it's an interesting
strawman API.

My thinking here is that *usually* an unlock failure is probably not
something to report to the user, but there are times when we definitely want
errors to propagate directly (e.g. an explicit invocation of break-lock, and
probably during the test suite).

So there's a mix of inputs here.  I'm having trouble articulating this
clearly, but roughly I feel that what's desired in any individual case may
depend on a combination of:

 - the cleanup operation itself;
 - what the callsite intends (some places may explicitly expect and suppress
   particular errors, so emitting something to the UI instead of raising
   something the callsite can catch may be a bad thing);
 - any global policy, like debug flags;
 - and maybe also the specific error (e.g. KeyboardInterrupt vs. connection
   lost vs. an assertion about unfinished progress bars)?

The challenge seems to be balancing these desires while keeping things
reasonable clean and simple.

> It seems to me the most common requests are:
> 
>  * TooManyConcurrentRequests as a knock-on effect from a previous
> operation being interrupted either because of a client-side bug or
> error, or a connection dropping
>  * Unlock failing because the connection has dropped or in conjunction
> with the above
> 
> The particular cleanups are probably unlock and then
> abort_write_group, and while they do have consequences failing to
> complete them is not a very severe problem.

Right.

In the case of abort_write_group, the consequence of failing to cleanup is
99% uninteresting to the user (there's some disk space that could be
reclaimed manually in .bzr/repository/upload, but meh).

In the case of unlock, if it's a read lock it's totally uninteresting.  If
it's a write lock then the user probably should be told about it because
they will probably need to run break-lock manually.

So my desire is:

 a) for the short term, find something reasonably cheap to do that addresses
    the bugs we have (even if incomplete or a bit ugly), and
 b) figure out, at least approximately, what we'd like to be doing about the
    problem long term, so that can make sure whatever short term things we
    do don't lead us away from that.

So the code in this patch, and the proposed policy, isn't meant to be the
final answer, but I hope a useful stepping stone towards a final answer that
can address some bugs today.

Or maybe I should just fix Python ;)

-Andrew.