[SRU][F][PULL] CryptoExpress EP11 cards are going offline

frank.heimes at canonical.com frank.heimes at canonical.com
Tue Aug 31 13:19:56 UTC 2021


From: Frank Heimes <frank.heimes at canonical.com>

BugLink: https://bugs.launchpad.net/bugs/1939618

[Impact]

* With current focal kernels IBM Z CryptoExpress adapters in EP11 mode go offline
  in case of unknown error indications from the hardware.

* This does not only lead to a software fallback,
  but can also lead to errors and crashes,
  if certain crypto operations are currently ongoing.

* A rework of the AP bus and zcrypt device driver,
  as it was done in 5.11, fixes the situation.

* From the below range of commits,
  the last 1/3 are the ones that fix the issue mentioned here
  and the others are pre-requisites to get the relevant ones applied.

* In theory the patch set could have been made smaller,
  but with the cost that the code would be a mix between old and new,
  with maybe some new code snippets,
  hence it would divert from what's upstream accepted (in 5.11 and above),
  the risk would increase,
  increased effort to maintain and less test coverage.

[Fix]

* The SRU request was created as pull request,
  so please pull f904c400c9c4^..f6d9ab1de03a
  (means starting at f904c400c9c4 to head/f6d9ab1de03a - both included)
  from here: https://code.launchpad.net/~fheimes/+git/lp1939618

[Test Case]

* An Ubuntu Server 20.04 on IBM Z or LinuxONE installation is required,
  with ideally three attached CryptoExpress adapters running
  in CCA, EP11 and accelerator mode.

* Run stress test on these three CryptoExpress adapters.

* IBM has such stress tests and ran these based on a patched Ubuntu 20.04 kernel.
  The tests come with a specially focus on error path tests,
  since this patch set mainly focuses on doing a better error patch handling.

* Note: A a new config option for the zcrypt driver was introduced
  that enables the possibility to inject erroneous messages.

* An application exists that generates such messages and thus tests these error paths.

* Canonical's focus will be mainly on regression testing.

[Regression Potential] 

* Like with all modification there is a certain risk of regressions,
  especially with bigger patch sets.

* But the modifications here are limited to the s390x platform,
  and there again largely to the s390x hardware crypto stack and driver
  (CryptoExpress adapter) which is optional hardware.
  (See the diff stat in the comment below.)

* The crypto-specific tools (located at the s390-tools package) may no longer work
  with this patched driver.
  But this got tested by IBM with the result that the changes are fully backward compatible.
  The 'older' s390 tools package (from focal) can just not show and control the new (config state) feature,
  but the functionality covered by the older s390 tools package is utterly covered by this patch set.

* The core of this patch set went into the 5.11 kernel upstream,
  hence is in hirsute (and has also been picked by other distros).

* Since this patch set is a rework of the AP bus and zcrypt driver code,
  it may now show new errors that were never thrown before, like for or example memory leaks.
  However, this is not unique to this patch set,
  it the same for upstream, Hirsute and Impish (and other distros).

* The patches are all upstream and all needed upstream commits could just be cherry-picked,
  hence no modifications were needed.

* So the commits were not only tested by IBM upfront,
  but a patched focal master-next kernel is also available as PPA (see comment below) for further testing.

* This patch set was also tested on 5.11, where two issues were found that are already part of this set.

[Other]

* I iterated through all commits and found that that the latest ones got upstream with 5.13,
  hence Impish includes all commits needed and is not affected!

* Looks like all commits, expect three, are even upstream with 5.11,
  but the missing three came in on top via upstream stable,
  hence Hirsute master-next includes all commits needed too and is also not affected!

* But non of the commits could be found in current Focal master-next (aot: 5.4.0-84),
  the first commits from this set started to land with 5.7,
  hence this SRU request is for focal only.

---

The following changes since commit 9e4ec1b8ea389754e30927a98a63f3ffa6e664a7:

  UBUNTU: upstream stable to v5.4.140 (2021-08-27 15:52:30 -0600)

are available in the Git repository at:

  git://git.launchpad.net/~fheimes/+git/lp1939618 f6d9ab1de03a36af1a3add6a31642bd6e8dbfd75

for you to fetch changes up to f6d9ab1de03a36af1a3add6a31642bd6e8dbfd75:

  s390/ap: Fix hanging ioctl caused by wrong msg counter (2021-08-30 09:13:20 +0200)

----------------------------------------------------------------
Gustavo A. R. Silva (1):
      s390: Replace zero-length array with flexible-array member

Harald Freudenberger (29):
      s390/zcrypt: Support for CCA protected key block version 2
      s390/zcrypt: replace snprintf/sprintf with scnprintf
      s390/ap: Remove ap device suspend and resume callbacks
      s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc
      s390/ap: remove power management code from ap bus and drivers
      s390/ap: introduce new ap function ap_get_qdev()
      s390/zcrypt: fix smatch warnings
      s390/zcrypt: code beautification and struct field renames
      s390/zcrypt: split ioctl function into smaller code units
      s390/ap: rename and clarify ap state machine related stuff
      s390/zcrypt: provide cex4 cca sysfs attributes for cex3
      s390/ap: rework crypto config info and default domain code
      s390/zcrypt: simplify cca_findcard2 loop code
      s390/zcrypt: remove set_fs() invocation in zcrypt device driver
      s390/zcrypt: Support for CCA APKA master keys
      s390/zcrypt: introduce msg tracking in zcrypt functions
      s390/ap: split ap queue state machine state from device state
      s390/ap: add error response code field for ap queue devices
      s390/ap: add card/queue deconfig state
      s390/sclp: Add support for SCLP AP adapter config/deconfig
      s390/ap: Support AP card SCLP config and deconfig operations
      s390/ap/zcrypt: revisit ap and zcrypt error handling
      s390/zcrypt: move ap_msg param one level up the call chain
      s390/zcrypt: Introduce Failure Injection feature
      s390/zcrypt: fix wrong format specifications
      s390/ap: fix ap devices reference counting
      s390/zcrypt: return EIO when msg retry limit reached
      s390/zcrypt: fix zcard and zqueue hot-unplug memleak
      s390/ap: Fix hanging ioctl caused by wrong msg counter

Joe Perches (1):
      s390/zcrypt: use fallthrough;

Qinglang Miao (1):
      s390/ap: remove unnecessary spin_lock_init()

Takashi Iwai (1):
      s390/zcrypt: Use scnprintf() for avoiding potential buffer overflow

Zou Wei (1):
      s390/zcrypt: use kzalloc

 arch/s390/appldata/appldata_os.c       |   2 +-
 arch/s390/include/asm/sclp.h           |   2 +
 arch/s390/include/uapi/asm/zcrypt.h    | 140 ++---
 drivers/s390/block/dasd_diag.c         |   2 +-
 drivers/s390/block/dasd_eckd.h         |   2 +-
 drivers/s390/char/Makefile             |   2 +
 drivers/s390/char/raw3270.h            |   2 +-
 drivers/s390/char/sclp.h               |   2 +-
 drivers/s390/char/sclp_ap.c            |  63 +++
 drivers/s390/char/sclp_pci.c           |   2 +-
 drivers/s390/cio/idset.c               |   2 +-
 drivers/s390/crypto/ap_bus.c           | 974 +++++++++++++++++----------------
 drivers/s390/crypto/ap_bus.h           | 139 +++--
 drivers/s390/crypto/ap_card.c          |  98 ++--
 drivers/s390/crypto/ap_debug.h         |   8 +
 drivers/s390/crypto/ap_queue.c         | 513 ++++++++++-------
 drivers/s390/crypto/pkey_api.c         |  20 +-
 drivers/s390/crypto/zcrypt_api.c       | 574 +++++++++++++------
 drivers/s390/crypto/zcrypt_api.h       |  49 +-
 drivers/s390/crypto/zcrypt_card.c      |  30 +-
 drivers/s390/crypto/zcrypt_ccamisc.c   | 320 +++++------
 drivers/s390/crypto/zcrypt_ccamisc.h   |  32 +-
 drivers/s390/crypto/zcrypt_cex2a.c     |   8 +-
 drivers/s390/crypto/zcrypt_cex2c.c     | 164 +++++-
 drivers/s390/crypto/zcrypt_cex4.c      | 197 ++++---
 drivers/s390/crypto/zcrypt_debug.h     |   8 +
 drivers/s390/crypto/zcrypt_ep11misc.c  |  41 +-
 drivers/s390/crypto/zcrypt_error.h     |  92 ++--
 drivers/s390/crypto/zcrypt_msgtype50.c | 179 +++---
 drivers/s390/crypto/zcrypt_msgtype6.c  | 374 +++++++------
 drivers/s390/crypto/zcrypt_msgtype6.h  |   8 +-
 drivers/s390/crypto/zcrypt_queue.c     |  26 +-
 32 files changed, 2415 insertions(+), 1660 deletions(-)
 create mode 100644 drivers/s390/char/sclp_ap.c



More information about the kernel-team mailing list