[Bug 1548009] Re: ZFS pools should be automatically scrubbed

Thu Jul 28 04:57:13 UTC 2016

** Description changed:

+ [Impact]
+ 
+ Xenial shipped with a cron job to automatically scrub ZFS pools, as
+ desired by many users and as implemented by mdadm for traditional Linux
+ software RAID.  Unfortunately, this cron job does not work, because it needs a PATH line for /sbin, where the zpool utility lives.
+ 
+ Given the existence of the cron job and various discussions on IRC, etc.,
+ users expect that scrubs are happening, when they are not.  This means ZFS
+ is not pre-emptively checking for (and correcting) corruption. The odds of
+ disk corruption are admittedly very low, but violating users' expectations
+ of data safety, especially when they've gone out of their way to use a
+ filesystem which touts data safety, is bad.
+ 
+ [Test Case]
+ 
+ $ truncate -s 1G test.img
+ $ sudo zpool create test `pwd`/test.img
+ $ sudo zpool status test
+ 
+ $ sudo vi /etc/cron.d/zfsutils-linux
+ Modify /etc/cron.d/zfsutils-linux to run the cron job in a few minutes
+ (modifying the date range if it's not currently the 8th through the 14th
+ and the "-eq 0" check if it's not currently a Sunday).
+ 
+ $ grep zfs /var/log/cron.log
+ Verify in /var/log/cron.log that the job ran.
+ 
+ $ sudo zpool status test
+ 
+ Expected results:
+   scan: scrub repaired 0 in ... on <shortly after the cron job ran>
+ 
+ Actual results:
+   scan: none requested
+ 
+ Then, add the PATH line, update the time rules in the cron job, and repeat
+ the test. Now it will work.
+ 
+ - OR -
+ 
+ The best test case is to leave the cron job file untouched, install the
+ patched package, wait for the second Sunday of the month, and verify with
+ zpool status that a scrub ran.  I did this, on Xenial, with the package I
+ built.  The debdiff is in comment #11 and was accepted to Yakkety.
+ 
+ If someone can get this in -proposed before the 14th, I'll gladly install
+ the actual package from -proposed and make sure it runs correctly on the
+ 14th.
+ 
+ [Regression Potential]
+ 
+ The patch only touches the cron.d file, which has only one cron job in it.
+ This cron job is completely broken (inoperative) at the moment, so the
+ regression potential is very low.
+ 
+ 
+ 
+ ORIGINAL, PRE-SRU, DESCRIPTION:
+ 
  mdadm automatically checks MD arrays. ZFS should automatically scrub
  pools too. Scrubbing a pool allows ZFS to detect on-disk corruption and
  (when the pool has redundancy) correct it. Note that ZFS does not
  blindly assume the other copy is correct; it will only overwrite bad
  data with data that is known to be good (i.e. it passes the checksum).

  I've attached a debdiff which accomplishes this. It builds and installs
  cleanly.

  The meat of it is the scrub script I've been using on production
  systems, both servers and laptops, and recommending in my Ubuntu root-
  on-ZFS HOWTO, for years, which scrubs all *healthy* pools. If a pool is
  not healthy, scrubbing it is bad for two reasons: 1) It adds a lot of
  disk load which could theoretically lead to another failure. We should
  save that disk load for resilvering. 2) Performance is already less on a
  degraded pool and scrubbing can make that worse, even though scrubs are
  throttled. Arguably, I might be being too conservative here, but the
  marginal benefit of scrubbing a *degraded* pool is pretty minimal as
  pools should not be left degraded for very long.

  The cron.d in this patch scrubs on the second Sunday of the month. mdadm
  scrubs on the first Sunday of the month. This way, if a system has both
  MD and ZFS pools, the load doesn't all happen at the same time. If the
  system doesn't have both types, it shouldn't really matter which week.
  If you'd rather make it the same week as MD, I see no problem with that.

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1548009

Title:
  ZFS pools should be automatically scrubbed

Status in zfs-linux package in Ubuntu:
  Fix Released

Bug description:
  [Impact]

  Xenial shipped with a cron job to automatically scrub ZFS pools, as
  desired by many users and as implemented by mdadm for traditional Linux
  software RAID.  Unfortunately, this cron job does not work, because it needs a PATH line for /sbin, where the zpool utility lives.

  Given the existence of the cron job and various discussions on IRC, etc.,
  users expect that scrubs are happening, when they are not.  This means ZFS
  is not pre-emptively checking for (and correcting) corruption. The odds of
  disk corruption are admittedly very low, but violating users' expectations
  of data safety, especially when they've gone out of their way to use a
  filesystem which touts data safety, is bad.

  [Test Case]

  $ truncate -s 1G test.img
  $ sudo zpool create test `pwd`/test.img
  $ sudo zpool status test

  $ sudo vi /etc/cron.d/zfsutils-linux
  Modify /etc/cron.d/zfsutils-linux to run the cron job in a few minutes
  (modifying the date range if it's not currently the 8th through the 14th
  and the "-eq 0" check if it's not currently a Sunday).

  $ grep zfs /var/log/cron.log
  Verify in /var/log/cron.log that the job ran.

  $ sudo zpool status test

  Expected results:
    scan: scrub repaired 0 in ... on <shortly after the cron job ran>

  Actual results:
    scan: none requested

  Then, add the PATH line, update the time rules in the cron job, and repeat
  the test. Now it will work.

  - OR -

  The best test case is to leave the cron job file untouched, install the
  patched package, wait for the second Sunday of the month, and verify with
  zpool status that a scrub ran.  I did this, on Xenial, with the package I
  built.  The debdiff is in comment #11 and was accepted to Yakkety.

  If someone can get this in -proposed before the 14th, I'll gladly install
  the actual package from -proposed and make sure it runs correctly on the
  14th.

  [Regression Potential]

  The patch only touches the cron.d file, which has only one cron job in it.
  This cron job is completely broken (inoperative) at the moment, so the
  regression potential is very low.

  ORIGINAL, PRE-SRU, DESCRIPTION:

  mdadm automatically checks MD arrays. ZFS should automatically scrub
  pools too. Scrubbing a pool allows ZFS to detect on-disk corruption
  and (when the pool has redundancy) correct it. Note that ZFS does not
  blindly assume the other copy is correct; it will only overwrite bad
  data with data that is known to be good (i.e. it passes the checksum).

  I've attached a debdiff which accomplishes this. It builds and
  installs cleanly.

  The meat of it is the scrub script I've been using on production
  systems, both servers and laptops, and recommending in my Ubuntu root-
  on-ZFS HOWTO, for years, which scrubs all *healthy* pools. If a pool
  is not healthy, scrubbing it is bad for two reasons: 1) It adds a lot
  of disk load which could theoretically lead to another failure. We
  should save that disk load for resilvering. 2) Performance is already
  less on a degraded pool and scrubbing can make that worse, even though
  scrubs are throttled. Arguably, I might be being too conservative
  here, but the marginal benefit of scrubbing a *degraded* pool is
  pretty minimal as pools should not be left degraded for very long.

  The cron.d in this patch scrubs on the second Sunday of the month.
  mdadm scrubs on the first Sunday of the month. This way, if a system
  has both MD and ZFS pools, the load doesn't all happen at the same
  time. If the system doesn't have both types, it shouldn't really
  matter which week. If you'd rather make it the same week as MD, I see
  no problem with that.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1548009/+subscriptions