[PATCH 10/13] crypto: qat - improve aer error reset handling

Thibault Ferrante thibault.ferrante at canonical.com
Thu Mar 7 22:05:48 UTC 2024


From: Mun Chun Yep <mun.chun.yep at intel.com>

BugLink: https://bugs.launchpad.net/bugs/2056354

Rework the AER reset and recovery flow to take into account root port
integrated devices that gets reset between the error detected and the
slot reset callbacks.

In adf_error_detected() the devices is gracefully shut down. The worker
threads are disabled, the error conditions are notified to listeners and
through PFVF comms and finally the device is reset as part of
adf_dev_down().

In adf_slot_reset(), the device is brought up again. If SRIOV VFs were
enabled before reset, these are re-enabled and VFs are notified of
restarting through PFVF comms.

Signed-off-by: Mun Chun Yep <mun.chun.yep at intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta at intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas at intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu at intel.com>
Signed-off-by: Herbert Xu <herbert at gondor.apana.org.au>
(cherry picked from commit 9567d3dc760931afc38f7f1144c66dd8c4b8c680 linux-next)
Signed-off-by: Thibault Ferrante <thibault.ferrante at canonical.com>
---
 drivers/crypto/intel/qat/qat_common/adf_aer.c | 26 ++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c
index b3d4b6b99c65..3597e7605a14 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_aer.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c
@@ -33,6 +33,19 @@ static pci_ers_result_t adf_error_detected(struct pci_dev *pdev,
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
 
+	set_bit(ADF_STATUS_RESTARTING, &accel_dev->status);
+	if (accel_dev->hw_device->exit_arb) {
+		dev_dbg(&pdev->dev, "Disabling arbitration\n");
+		accel_dev->hw_device->exit_arb(accel_dev);
+	}
+	adf_error_notifier(accel_dev);
+	adf_pf2vf_notify_fatal_error(accel_dev);
+	adf_dev_restarting_notify(accel_dev);
+	adf_pf2vf_notify_restarting(accel_dev);
+	adf_pf2vf_wait_for_restarting_complete(accel_dev);
+	pci_clear_master(pdev);
+	adf_dev_down(accel_dev, false);
+
 	return PCI_ERS_RESULT_NEED_RESET;
 }
 
@@ -180,14 +193,25 @@ static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev,
 static pci_ers_result_t adf_slot_reset(struct pci_dev *pdev)
 {
 	struct adf_accel_dev *accel_dev = adf_devmgr_pci_to_accel_dev(pdev);
+	int res = 0;
 
 	if (!accel_dev) {
 		pr_err("QAT: Can't find acceleration device\n");
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
-	if (adf_dev_aer_schedule_reset(accel_dev, ADF_DEV_RESET_SYNC))
+
+	if (!pdev->is_busmaster)
+		pci_set_master(pdev);
+	pci_restore_state(pdev);
+	pci_save_state(pdev);
+	res = adf_dev_up(accel_dev, false);
+	if (res && res != -EALREADY)
 		return PCI_ERS_RESULT_DISCONNECT;
 
+	adf_reenable_sriov(accel_dev);
+	adf_pf2vf_notify_restarted(accel_dev);
+	adf_dev_restarted_notify(accel_dev);
+	clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status);
 	return PCI_ERS_RESULT_RECOVERED;
 }
 
-- 
2.43.0




More information about the kernel-team mailing list