ACK: [Yakkety][PATCH] UBUNTU: SAUCE: (no-up) NVMe: only setup MSIX once
Luis Henriques
luis.henriques at canonical.com
Mon Jan 9 14:57:02 UTC 2017
On Fri, Jan 06, 2017 at 02:27:05PM -0500, Dan Streetman wrote:
> BugLink: http://bugs.launchpad.net/bugs/1648449
> BugLink: http://bugs.launchpad.net/bugs/1651602
>
> The NVMe driver currently configures a single MSI/MSIX interrupt for its
> admin queue, then later after initializing the NVMe controller and
> detecting the number of queues the controller can handle, the driver
> releases its MSI/MSIX interrupt and immediately re-configures enough
> MSI/MSIX interrupts for all the queues. However, on Amazon AWS instances
> with passthrough NVMe drives, this MSI/MSIX release/request causes
> failures of some of the interrupts, resulting in some (usually all but one
> or two but sometimes none) of the NVMe drive(s) failing to initialize.
>
> This combines the MSI/MSIX configurations, so all the MSI/MSIX configuration
> is done once. This avoids the interrupt failures, but it also results in
> more MSI/MSIX interrupts being configured than are needed.
>
> This patch for yakkety also includes the regression fix from xenial:
> 8fb7c1f39c99 ("nvme: only require 1 interrupt vector, not 2+")
>
> Signed-off-by: Dan Streetman <dan.streetman at canonical.com>
> (forward-ported from xenial commit 96fce9e4025b96b08bfe5196d3380ab9215cb64b)
> ---
> drivers/nvme/host/pci.c | 70 +++++++++++++++----------------------------------
> 1 file changed, 21 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 3a4078a..9206ff1 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1490,7 +1490,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
> struct pci_dev *pdev = to_pci_dev(dev->dev);
> int result, i, vecs, nr_io_queues, size;
>
> - nr_io_queues = num_online_cpus();
> + nr_io_queues = dev->max_qid;
> result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
> if (result < 0)
> return result;
> @@ -1522,45 +1522,8 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
> adminq->q_db = dev->dbs;
> }
>
> - /* Deregister the admin queue's interrupt */
> - free_irq(dev->entry[0].vector, adminq);
> -
> - /*
> - * If we enable msix early due to not intx, disable it again before
> - * setting up the full range we need.
> - */
> - if (pdev->msi_enabled)
> - pci_disable_msi(pdev);
> - else if (pdev->msix_enabled)
> - pci_disable_msix(pdev);
> -
> - for (i = 0; i < nr_io_queues; i++)
> - dev->entry[i].entry = i;
> - vecs = pci_enable_msix_range(pdev, dev->entry, 1, nr_io_queues);
> - if (vecs < 0) {
> - vecs = pci_enable_msi_range(pdev, 1, min(nr_io_queues, 32));
> - if (vecs < 0) {
> - vecs = 1;
> - } else {
> - for (i = 0; i < vecs; i++)
> - dev->entry[i].vector = i + pdev->irq;
> - }
> - }
> -
> - /*
> - * Should investigate if there's a performance win from allocating
> - * more queues than interrupt vectors; it might allow the submission
> - * path to scale better, even if the receive path is limited by the
> - * number of interrupts.
> - */
> - nr_io_queues = vecs;
> dev->max_qid = nr_io_queues;
>
> - result = queue_request_irq(dev, adminq, adminq->irqname);
> - if (result) {
> - adminq->cq_vector = -1;
> - goto free_queues;
> - }
> return nvme_create_io_queues(dev);
>
> free_queues:
> @@ -1712,7 +1675,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
> static int nvme_pci_enable(struct nvme_dev *dev)
> {
> u64 cap;
> - int result = -ENOMEM;
> + int result = -ENOMEM, nr_io_queues, i, vecs;
> struct pci_dev *pdev = to_pci_dev(dev->dev);
>
> if (pci_enable_device_mem(pdev))
> @@ -1729,21 +1692,30 @@ static int nvme_pci_enable(struct nvme_dev *dev)
> goto disable;
> }
>
> - /*
> - * Some devices and/or platforms don't advertise or work with INTx
> - * interrupts. Pre-enable a single MSIX or MSI vec for setup. We'll
> - * adjust this later.
> - */
> - if (pci_enable_msix(pdev, dev->entry, 1)) {
> - pci_enable_msi(pdev);
> - dev->entry[0].vector = pdev->irq;
> + nr_io_queues = num_possible_cpus();
> +
> + for (i = 0; i < nr_io_queues; i++)
> + dev->entry[i].entry = i;
> + vecs = pci_enable_msix_range(pdev, dev->entry, 1, nr_io_queues);
> + if (vecs < 0) {
> + vecs = pci_enable_msi_range(pdev, 1, min(nr_io_queues, 32));
> + if (vecs < 0) {
> + result = vecs;
> + goto disable;
> + } else {
> + for (i = 0; i < vecs; i++)
> + dev->entry[i].vector = i + pdev->irq;
> + }
> }
>
> - if (!dev->entry[0].vector) {
> - result = -ENODEV;
> + if (vecs < 1) {
> + dev_err(dev->ctrl.device, "Failed to get any MSI/MSIX interrupts\n");
> + result = -ENOSPC;
> goto disable;
> }
>
> + dev->max_qid = vecs;
> +
> cap = lo_hi_readq(dev->bar + NVME_REG_CAP);
>
> dev->q_depth = min_t(int, NVME_CAP_MQES(cap) + 1, NVME_Q_DEPTH);
> --
> 2.9.3
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
It seems to make sense to forward-port this to yakkety (as we have it
already in xenial). The backport seems to be correct and hopefully tested
:-)
Acked-by: Luis Henriques <luis.henriques at canonical.com>
Cheers,
--
Luís
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20170109/afe7514a/attachment.sig>
More information about the kernel-team
mailing list