[Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours

Richard Laager 1872118 at bugs.launchpad.net
Tue Aug 11 20:06:19 UTC 2020


Jorge, I agree with Gianfranco Costamagna that a rebuild of isc-dhcp is
NOT required. Why do you think it is?

Presumably BIND also uses these libraries? If so, it seems like the Test
Case should involve making sure BIND still seems to work, and that BIND
should be mentioned in the Regression Potential. My DHCP servers also
run BIND for recursive DNS and that has been fine with the patch
applied.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1872118

Title:
  [SRU] DHCP Cluster crashes after a few hours

Status in DHCP:
  New
Status in bind9-libs package in Ubuntu:
  In Progress
Status in isc-dhcp package in Ubuntu:
  In Progress
Status in bind9-libs source package in Focal:
  In Progress
Status in isc-dhcp source package in Focal:
  In Progress
Status in bind9-libs source package in Groovy:
  In Progress
Status in isc-dhcp source package in Groovy:
  In Progress

Bug description:
  [Description]

  isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket
  in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion
  will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state.

  If this race condition happens, the following stacktrace will be
  hit:

  (gdb) info threads
    Id Target Id Frame
  * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
    2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex at entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52
    3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, processes_to_wake=1, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:364
    4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183

  (gdb) frame 2
  #2 0x00007fb4dec85985 in isc_assertion_failed (file=file at entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line at entry=3361, type=type at entry=isc_assertiontype_insist,
      cond=cond at entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52
  (gdb) bt
  #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79
  #2 0x00007fb4dec85985 in isc_assertion_failed (file=file at entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line at entry=3361, type=type at entry=isc_assertiontype_insist,
      cond=cond at entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52
  #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041
  #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054
  #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211
  #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397
  #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at pthread_create.c:477
  #8 0x00007fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

  (gdb) frame 3
  #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041
  4041 in ../../../../lib/isc/unix/socket.c
  (gdb) p sock->pending_send
  $2 = 1

  [TEST CASE]

  1) Install isc-dhcp-server in 2 focal machine(s).
  2) Configure peer/cluster mode as follows:
     Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/
     Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/
  2) Run dhcpd as follows in both machine(s)

  # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4

  3) Leave the cluster running for a long (2h) period until the
  crash/race condition is reproduced.

  [REGRESSION POTENTIAL]

  * The fix will prevent the assertion to happen in the dispatch_send
  path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this
  assertion at process_fd shouldn't be problematic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions



More information about the foundations-bugs mailing list