[Bug 1861389] Re: nmap hang due to BADF ioctl inside select call is returning good rc

Thu Jan 30 09:26:35 UTC 2020

There are too many moving parts to have this be a hard-fact. But I only
saw it on x86 with kernel 5.4 so far. While other architectures or in
former times this didn't happen.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1861389

Title:
  nmap hang due to BADF ioctl inside select call is returning good rc

Status in glibc package in Ubuntu:
  New
Status in nmap package in Ubuntu:
  Incomplete

Bug description:
  Hi,
  the reproducibility of this is hard, but I've seen it twice in a week now so it is not a stray cosmic ray after all.

  I've seen nmap hang that was running with a timeout already which made
  me wonder.

  $ nmap -Pn 192.168.122.0/24 --host-timeout 10

  In PS the output looks normal usually in a wchan with a timeout:

  $ ps axlf
  ...
  4     0   10155       0  20   0  25372 17320 poll_s S    ?          0:07 nmap -Pn 192.168.122.0/24 --host-timeout 10

  Full wchan:
  cat /proc/10155/wchan 
  poll_schedule_timeout.constprop.0

  I found that nmap isn't dead, it is looping it seems.
  Obviously on GDB you always find it at the timeout

  0x00007fe5b73a725a in __GI___select (nfds=nfds at entry=5, readfds=readfds at entry=0x7ffec6db1540, writefds=writefds at entry=0x0, exceptfds=exceptfds at entry=0x0, 
      timeout=timeout at entry=0x7ffec6db1670) at ../sysdeps/unix/sysv/linux/select.c:41
  41      ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
  (gdb) bt
  #0  0x00007fe5b73a725a in __GI___select (nfds=nfds at entry=5, readfds=readfds at entry=0x7ffec6db1540, writefds=writefds at entry=0x0, exceptfds=exceptfds at entry=0x0, 
      timeout=timeout at entry=0x7ffec6db1670) at ../sysdeps/unix/sysv/linux/select.c:41
  #1  0x000055900ff4f838 in pcap_select (p=p at entry=0x55901131ec10, timeout=timeout at entry=0x7ffec6db1670) at netutil.cc:1012
  #2  0x000055900ff4fa15 in pcap_select (usecs=999993, p=0x55901131ec10) at netutil.cc:1031
  #3  read_reply_pcap (pd=0x55901131ec10, to_usec=999993, accept_callback=accept_callback at entry=0x55900ff4e530 <accept_arp(unsigned char const*, pcap_pkthdr const*, int, size_t)>, 
      p=p at entry=0x7ffec6db16f0, head=head at entry=0x7ffec6db1700, rcvdtime=rcvdtime at entry=0x7ffec6db1770, datalink=0x7ffec6db16ec, offset=0x7ffec6db16f8) at netutil.cc:4234
  #4  0x000055900ff56014 in read_arp_reply_pcap (pd=<optimized out>, sendermac=sendermac at entry=0x7ffec6db1792 "", senderIP=senderIP at entry=0x7ffec6db176c, to_usec=<optimized out>, 
      rcvdtime=rcvdtime at entry=0x7ffec6db1770, trace_callback=0x55900ff0ef70 <PacketTrace::traceArp(int, unsigned char const*, unsigned int, timeval*)>) at netutil.cc:4324
  #5  0x000055900fefce80 in get_arp_result (USI=0x7ffec6db1890, stime=0x7ffec6db1860) at scan_engine_raw.cc:1557
  #6  0x000055900fef5e67 in ultra_scan(std::vector<Target*, std::allocator<Target*> >&, scan_lists*, stype, timeout_info*) () at scan_engine.cc:2546
  #7  0x000055900ff0e1cc in arpping (hostbatch=<optimized out>, num_hosts=255) at targets.cc:187
  #8  0x000055900ff0ed00 in refresh_hostbatch (hs=<optimized out>, exclude_group=0x559010b87780, ports=0x559010142980 <ports>, pingtype=1) at targets.cc:591
  #9  0x000055900ff0ee5d in nexthost (hs=hs at entry=0x7ffec6db2010, exclude_group=exclude_group at entry=0x559010b87780, ports=ports at entry=0x559010142980 <ports>, pingtype=<optimized out>)
      at targets.cc:644
  #10 0x000055900fec8ec3 in nmap_main (argc=<optimized out>, argv=0x7ffec6db3058) at nmap.cc:2063
  #11 0x000055900fe9e215 in main (argc=5, argv=0x7ffec6db3058) at main.cc:237

  But strace has shown that the syscall is bad
  # strace -rT -p 10155
  strace: Process 10155 attached
       0.000000 select(5, [4], NULL, NULL, {tv_sec=0, tv_usec=438201}) = 0 (Timeout) <0.438773>
       0.439032 ioctl(-1, TIOCGPGRP, 0x7ffec6db1784) = -1 EBADF (Bad file descriptor) <0.000028>
       0.000170 getpgrp()                 = 0 <0.000021>
       0.000092 select(5, [4], NULL, NULL, {tv_sec=0, tv_usec=999992}) = 0 (Timeout) <1.001123>
       1.001253 ioctl(-1, TIOCGPGRP, 0x7ffec6db1784) = -1 EBADF (Bad file descriptor) <0.000019>
       0.000075 getpgrp()                 = 0 <0.000018>

  I expected nmap to give up if the select call went bad - and indeed the code would:
  1008      FD_SET(fd, &rfds);
  1009
  1010      do {
  1011        errno = 0;
  1012        ret = select(fd + 1, &rfds, NULL, NULL, timeout);
  1013        if (ret == -1) {
  1014          if (errno == EINTR)
  1015            netutil_error("%s: %s", __func__, strerror(errno));
  1016          else
  1017            netutil_fatal("Your system does not support select()ing on pcap devices (%s). PLEASE REPORT THIS ALONG WITH DETAILED SYSTEM INFORMATION TO THE nmap-dev MAILING LIST!", strerror(errno));

  But glibc returns a good RC on this ioctl error:
  1012        ret = select(fd + 1, &rfds, NULL, NULL, timeout);
  (gdb) n
  1013        if (ret == -1) {
  (gdb) p ret
  $1 = 1

  There isn't much more code in glbic shown in GDB
  Breakpoint 1, pcap_select (p=p at entry=0x55901131ec10, timeout=timeout at entry=0x7ffec6db1670) at netutil.cc:976
  976     in netutil.cc
  (gdb) c
  Continuing.

  Breakpoint 2, __GI___select (nfds=nfds at entry=5, readfds=readfds at entry=0x7ffec6db1540, writefds=writefds at entry=0x0, exceptfds=exceptfds at entry=0x0, timeout=timeout at entry=0x7ffec6db1670)
      at ../sysdeps/unix/sysv/linux/select.c:39
  39      {
  (gdb) n
  41        return SYSCALL_CANCEL (select, nfds, readfds, writefds, exceptfds,
  (gdb) s

  Is it a bug in glibc, would it need to return -1 in that case?

  Note: it also is a bug in nmap to loop on it but they could say "but
  the answer fromt he socket call is wrong", so lets sort that out
  first.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1861389/+subscriptions