[B][PATCH 00/11] LP#1829563 bcache: risk of data loss on I/O errors in backing or caching devices

Mauricio Faria de Oliveira mfo at canonical.com
Mon Jul 8 00:50:27 UTC 2019


BugLink: https://bugs.launchpad.net/bugs/1829563

Note: the patchset is relatively large because the support
for error detection/handling is mostly non-existent in 4.15.
All patches are in Cosmic/4.18 except PATCH 11 (other series).

[Impact]

 * The bcache code in Bionic lacks several fixes to handle
   I/O errors in both backing devices and caching devices.

 * Partial or permanent errors in backing or caching devices,
   specially in writeback mode, can lead to data loss and/or
   the application is not notified about failed I/O requests.

 * The bcache device might remain available for I/O requests
   even if backing device is offline, so writes are undefined.

[Test Case]

 * Detailed test cases/steps for the behavior of many patches
   with code logic changes are provided in bug comments.

 * The patchset has been tested for regressions on each cache
   mode (writethrough, writeback, writearound, none) with the
   xfstests test suite (on ext4) and fio (sequential & random
   read-write).

[Regression Potential]

 * The patchset is relatively large and touches several areas
   in bcache code, however, synthetic testing of the patches
   has been performed, and extensive regression/stress tests
   were run (as mentioned in Test Case section).

 * Many patches in the patchset are 'Fixes' patches to other
   patches, and no further 'Fixes' currently exist upstream.

[Other Info]

 * Canonical Field Eng. deploys bcache+writeback extensively
   (e.g., BootStack, UA cloud, except rare all-flash cases).

[Original Bug Description]

This is a request for a backport of the following upstream patch from 4.18:

"bcache: stop bcache device when backing device is offline"
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee

Field engineering uses bcache quite extensively and it would be good to have this in the GA/bionic kernel.


Coly Li (9):
  bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
  bcache: add stop_when_cache_set_failed option to backing device
  bcache: add backing_request_endio() for bi_end_io
  bcache: add io_disable to struct cached_dev
  bcache: store disk name in struct cache and struct cached_dev
  bcache: count backing device I/O error for writeback I/O
  bcache: add wait_for_kthread_stop() in bch_allocator_thread()
  bcache: set dc->io_disable to true in conditional_stop_bcache_device()
  bcache: stop bcache device when backing device is offline

Tang Junhui (2):
  bcache: fix inaccurate io state for detached bcache devices
  bcache: fix ioctl in flash device

 drivers/md/bcache/alloc.c     |   8 +-
 drivers/md/bcache/bcache.h    |  54 +++++++++
 drivers/md/bcache/btree.c     |  11 +-
 drivers/md/bcache/debug.c     |   3 +-
 drivers/md/bcache/io.c        |  20 +++-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   | 186 +++++++++++++++++++++++++-----
 drivers/md/bcache/super.c     | 207 +++++++++++++++++++++++++++++-----
 drivers/md/bcache/sysfs.c     |  50 +++++++-
 drivers/md/bcache/util.h      |   6 -
 drivers/md/bcache/writeback.c |  37 ++++--
 11 files changed, 497 insertions(+), 89 deletions(-)

-- 
2.17.1




More information about the kernel-team mailing list