Help with strange bzr connection hangs

Martin Pool mbp at canonical.com
Tue Oct 4 22:12:41 UTC 2011


On 5 October 2011 04:52, Eli Zaretskii <eliz at gnu.org> wrote:
>> From: Martin Pool <mbp at canonical.com>
>> Date: Tue, 4 Oct 2011 12:16:18 +1100
>> Cc: bazaar at lists.canonical.com
>>
>> I don't know, based on that, what it would be, but the general kind of
>> thing I would try to find out next in this type of situation is just
>> what is going on when it is hanging: what is bzr doing, what is the
>> external ssh transport (if any) doing, and what is the OS tcp socket
>> doing?
>
> It looks like bzr is waiting forever in `select'.  Here's the stack of
> one of the two threads shown by Process Explorer:
>
>  ntoskrnl.exe!ExReleaseResourceLite+0x2be
>  ntoskrnl.exe!IoPageRead+0xc50
>  ntoskrnl.exe!IoGetBaseFileSystemDeviceObject+0x730
>  ntoskrnl.exe!NtWaitForSingleObject+0x94
>  ntoskrnl.exe!KiDeliverApc+0xbbb
>  ntdll.dll!KiFastSystemCallRet
>  MSWSOCK.dll+0x5fa7
>  WS2_32.dll!select+0xa7
>  _socket.pyd!init_socket+0x1c6e
>
> I have no idea what that means.  Why would it wait forever? aren't
> there timeouts? am I looking at some deadlock in the kernel?

select is used to wait for network io.  When a connection is active we
typically wait forever to be able to read or write.  (Perhaps there
should be a very long timeout where bzr decides for itself the network
has jammed, but for the moment we rely on the user.)

So the question then is: which fds is it waiting on (what are the
arguments to select?) and which sockets do they correspond to, and
what does the OS think the state of those sockets is (which is the
netstat output.)

>>  * pop into the bzr debugger with ctrl-break and then get a backtrace
>> (type 'bt')
>
> Ctrl-Break doesn't seem to be able to interrupt bzr in this state,
> probably because it is stuck inside a system call.

That's a little strange, because select is interruptible on unix, but
perhaps not on Windows.

>>  * is there a windows equivalent to 'netstat -ponet' that shows the
>> socket state?
>
> The socket state is ESTABLISHED, if this is what you wanted to know.

OK, but I'd also like to know the length of the rx and tx queues, and
which timer is active if any, and if so what its value and counter is.

Thanks.

m



More information about the bazaar mailing list