[Bug 1856084] Re: Livelock between ZFS evict and writeback threads

Steve Langasek steve.langasek at canonical.com
Sat Dec 14 01:26:37 UTC 2019


Hello Heitor, or anyone else affected,

Accepted zfs-linux into eoan-proposed. The package will build now and be
available at https://launchpad.net/ubuntu/+source/zfs-
linux/0.8.1-1ubuntu14.3 in a few hours, and then in the -proposed
repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-eoan to verification-done-eoan. If it does not fix
the bug for you, please add a comment stating that, and change the tag
to verification-failed-eoan. In either case, without details of your
testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: zfs-linux (Ubuntu Eoan)
       Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1856084

Title:
  Livelock between ZFS evict and writeback threads

Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Disco:
  Confirmed
Status in zfs-linux source package in Eoan:
  Fix Committed
Status in zfs-linux source package in Focal:
  Fix Released
Status in zfs-linux package in Debian:
  Unknown

Bug description:
  Livelock between ZFS evict and writeback threads

  [Impact]
  ZIO pipeline stalls, causing ZFS workloads to hang indefinitely

  [Description]
  For certain ZFS workloads, we start seeing hung task timeouts in the kernel logs due to zil_commit() stalling. This is due to zfs_zget() not detecting whether a znode has been marked for deletion before attempting to access it, causing a constant "retry loop" in zfs_get_data() if that znode has been unlinked already. An example of the stack traces follows:

  [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [72742.070429] mysqld          D    0  5713   2881 0x00000320
  [72742.073220] Call Trace:
  [72742.075305]  __schedule+0x24e/0x880
  [72742.090436]  schedule+0x2c/0x80
  [72742.090438]  schedule_preempt_disabled+0xe/0x10
  [72742.090441]  __mutex_lock.isra.5+0x276/0x4e0
  [72742.090547]  ? dmu_tx_destroy+0x105/0x130 [zfs]
  [72742.090555]  __mutex_lock_slowpath+0x13/0x20
  [72742.115374]  ? __mutex_lock_slowpath+0x13/0x20
  [72742.132266]  mutex_lock+0x2f/0x40
  [72742.134207]  zil_commit_impl+0x1b0/0x1b30 [zfs]
  [72742.150428]  ? spl_kmem_alloc+0x115/0x180 [spl]
  [72742.152622]  ? mutex_lock+0x12/0x40
  [72742.154819]  ? zfs_refcount_add_many+0x9a/0x100 [zfs]
  [72742.171450]  zil_commit+0xde/0x150 [zfs]
  [72742.173687]  zfs_fsync+0x77/0xe0 [zfs]
  [72742.175044]  zpl_fsync+0x80/0x110 [zfs]
  [72742.191690]  vfs_fsync_range+0x51/0xb0
  [72742.193876]  do_fsync+0x3d/0x70
  [72742.195126]  SyS_fsync+0x10/0x20
  [72742.211059]  do_syscall_64+0x73/0x130
  [72742.214078]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

  It's possible to hit this issue due to a race between the ZFS evict
  and writeback threads. If the z_iput task is trying to evict a znode
  that's currently sitting in the writeback thread, both will "livelock"
  each other and stall the ZIO pipeline, causing other ZFS operations
  (such as zil_commit) to hang indefinitely.

  This has been documented and fixed upstream in PR#9583 [0]. We need to
  pull two fixes from upstream: the first one fixes the zfs_zget() issue
  in the writeback thread, while the second fixes a regression on
  O_TMPFILE descriptors caused by the first one.

  Upstream patches:
   - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8)
   - Check for unlinked znodes after igrab() (0c46813805f4)

  [Test Case]
  Being a race condition, this issue has been hard to reproduce consistently. The racing window between evict() and the ZFS writeback thread is quite strict, but users have reported this to show up after some hours of running LXD-containerized mySQL workloads.

  [Regression Potential]
  These patches have been tested both in the ZFS test suite and in production environments, so the potential for further regressions should be low.
  Additional regressions would likely cause issues with the ZFS writeback/commit and IO pipeline, so they should be spotted fairly quickly.

  [0] https://github.com/zfsonlinux/zfs/pull/9583
  [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8
  [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions



More information about the Ubuntu-sponsors mailing list