[PING][MERGE] Waiting on locks

Sat Sep 9 16:28:09 BST 2006

Matthieu Moy wrote:
> John Arbash Meinel <john at arbash-meinel.com> writes:
> 

...

> I'd say "double the delay at each retry, starting with ~0.5 second".
> 
> If you've been trying several times without success, it means the
> server is overloaded, and there's no point in overloadint it even
> more.

I agree, though I would include a maximum delay. Possibly something like
1/min or maybe 1/5min. I wouldn't want the delay inbetween to grow
without bound.

> 
> Additionnaly, there might be some firewall qos policy that blacklists
> the client temporarily if it does repeated attempts. For example, my
> french lab's firewall does "if an ip connects more than 5 times in a
> minute, block it for one minute (and any retry extends the block
> period)". So, any sufficiently stupid brute-force attack is blocked
> forever, but still, nmap can scan the machine (because it has a
> strategy of not retrying too often when it sees it's locked).

Well, then bzr is already going to have a hell of a time going through
this firewall anyway. We have started doing pretty well at using
Keep-Alive over http (we are slowly getting better).
But until recently every request was a new connection. And when you have
a long history and a lot of files, that is a lot of connections.

On the other hand, if you are talking about locking. We actually would
keep the ssh connection open throughout all of this time, so you
wouldn't see any new connections going through the firewall. Just some
encrypted activity on ssh, that you wouldn't be able to tell anything
other than it seems to happen every 0.5s or so.

> 
> Be sure that as a sysadmin, if I see a client sending one request
> every half second for hours, I'll blacklist it immediately
> (fortunately, I'm not a sysadmin ;-).
> 
>> It is flexible enough. But there are some api issues. (LockDir tries to
>> look like a plain lock to the Branch/Repository, so that the old format
>> code doesn't have to be special cased. and LockDir shouldn't manually
>> reach out to read the lock timeout information from the config file).
> 
> If there's a way to know whether a successive failures in obtaining
> the lock come from the same lock, it would be interresting to use this
> information. It helps to distinguish between stale locks and
> overloaded server.
> 

Yeah. With the LockDir format, we can peek() at the contents of the
lock, which should tell us who has the lock. This also brings a big
point about wanting to cap the maximum time between attempts. Because if
you do have a heavily loaded server, I think we might want to reset the
time inbetween queries whenever the lock changes hands.

If you have a lot of pending clients, that could lead to some weird
effects. But honestly, you should never have that many waiting clients.
And if you do, most likely most of them are going to fail when they get
access to the lock anyway, because the target contents have changed.
(This is true if 2 clients are going after the same branch, but not so
much if they are going for the same repository.)

I'm fine with limited exponential backoff. And a default timeout <= 1hr.

(500 ms is too fast anyway, over a slow network connection it can take
longer than that just to try to grab the lock)

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060909/c9a6690d/attachment.pgp