Help with strange bzr connection hangs

John Meinel john at arbash-meinel.com
Wed Oct 5 05:59:52 UTC 2011


Blocking calls are generally not interuptable on windows. Select, recv, etc.

If paramiko is in the mix, I wonder if it is a small ssh incompatibility.
Like not supporting re-keying that we saw in the past. That particular one
doesn't fit here, though.

John
=:->
On Oct 5, 2011 12:13 AM, "Martin Pool" <mbp at canonical.com> wrote:
> On 5 October 2011 04:52, Eli Zaretskii <eliz at gnu.org> wrote:
>>> From: Martin Pool <mbp at canonical.com>
>>> Date: Tue, 4 Oct 2011 12:16:18 +1100
>>> Cc: bazaar at lists.canonical.com
>>>
>>> I don't know, based on that, what it would be, but the general kind of
>>> thing I would try to find out next in this type of situation is just
>>> what is going on when it is hanging: what is bzr doing, what is the
>>> external ssh transport (if any) doing, and what is the OS tcp socket
>>> doing?
>>
>> It looks like bzr is waiting forever in `select'.  Here's the stack of
>> one of the two threads shown by Process Explorer:
>>
>>  ntoskrnl.exe!ExReleaseResourceLite+0x2be
>>  ntoskrnl.exe!IoPageRead+0xc50
>>  ntoskrnl.exe!IoGetBaseFileSystemDeviceObject+0x730
>>  ntoskrnl.exe!NtWaitForSingleObject+0x94
>>  ntoskrnl.exe!KiDeliverApc+0xbbb
>>  ntdll.dll!KiFastSystemCallRet
>>  MSWSOCK.dll+0x5fa7
>>  WS2_32.dll!select+0xa7
>>  _socket.pyd!init_socket+0x1c6e
>>
>> I have no idea what that means.  Why would it wait forever? aren't
>> there timeouts? am I looking at some deadlock in the kernel?
>
> select is used to wait for network io. When a connection is active we
> typically wait forever to be able to read or write. (Perhaps there
> should be a very long timeout where bzr decides for itself the network
> has jammed, but for the moment we rely on the user.)
>
> So the question then is: which fds is it waiting on (what are the
> arguments to select?) and which sockets do they correspond to, and
> what does the OS think the state of those sockets is (which is the
> netstat output.)
>
>>>  * pop into the bzr debugger with ctrl-break and then get a backtrace
>>> (type 'bt')
>>
>> Ctrl-Break doesn't seem to be able to interrupt bzr in this state,
>> probably because it is stuck inside a system call.
>
> That's a little strange, because select is interruptible on unix, but
> perhaps not on Windows.
>
>>>  * is there a windows equivalent to 'netstat -ponet' that shows the
>>> socket state?
>>
>> The socket state is ESTABLISHED, if this is what you wanted to know.
>
> OK, but I'd also like to know the length of the rx and tx queues, and
> which timer is active if any, and if so what its value and counter is.
>
> Thanks.
>
> m
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20111005/5e748169/attachment-0001.html>


More information about the bazaar mailing list