ACK/cmnt: [Trusty][PATCH] UBUNTU: SAUCE: xen: do not re-use pirq number cached in pci device msi msg data

Wed Jan 18 16:42:55 UTC 2017

On 18.01.2017 17:08, Dan Streetman wrote:
> On Mon, Jan 16, 2017 at 6:34 AM, Stefan Bader
> <stefan.bader at canonical.com> wrote:
>> On 13.01.2017 22:13, Dan Streetman wrote:
>>> BugLink: http://bugs.launchpad.net/bugs/1656381
>>>
>>> Revert the main part of commit:
>>> af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>>>
>>> That commit introduced reading the pci device's msi message data to see
>>> if a pirq was previously configured for the device's msi/msix, and re-use
>>> that pirq.  At the time, that was the correct behavior.  However, a
>>> later change to Qemu caused it to call into the Xen hypervisor to unmap
>>> all pirqs for a pci device, when the pci device disables its MSI/MSIX
>>> vectors; specifically the Qemu commit:
>>> c976437c7dba9c7444fb41df45468968aaa326ad
>>> ("qemu-xen: free all the pirqs for msi/msix when driver unload")
>>>
>>> Once Qemu added this pirq unmapping, it was no longer correct for the
>>> kernel to re-use the pirq number cached in the pci device msi message
>>> data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
>>> pirqs when the pci device disables its MSI/MSIX vectors.
>>
>> In Trusty release we have qemu-2.0.0. I wonder whether adding the change to the
>> kernel (hm, actually same would apply to xenial via hwe path) would cause
>> regressions when not using the cloud-archive...
> 
> Yes, but applying this patch to the Trusty kernel is completely
> unrelated to a Trusty-based Xen hypervisor.  This change is to the
> guest kernel.  Meaning, if we don't apply this kernel patch to the
> Trusty kernel, X/Y/Z patched kernels will still see problems if
> they're running under a Trusty-based Xen hypervisor (i.e. qemu 2.0.0
> or earlier), while T unpatched kernel will see problems if it's
> running under a newer (Vivid or later, or UCA Kilo or later) Xen
> hypervisor.

You are right. I just started to think about compat issues while looking at the
Trusty kernel and then wondering which qemu version would be around in that release.
Put in simple words. Patching any kernel will allow a guest VM to properly use
pci-passthrough on any host/hypervisor that uses a qemu-2.1 onwards. And breaks
it on older systems. Which means Precise (too old to worry too much) and Trusty
based Xen hosts (not having cloud-archive enabled). And I do not have real
numbers about how often there is the case of people running Trusty Xen +
pci-passthrough. Guess we will see.
Maybe the other unknown is what versions of qemu run on which instance type
hosts on AWS. Hopefully recent enough ones on those that offer pass-through...

But overall no reason to not change the Trusty kernel, too.

-Stefan
> 
> Applying this to T as well as X/Y/Z means, all of them will work right
> under qemu 2.1.0 or later Xen hypervisor, and all of them will not
> work right under qemu 2.0.0 or earlier Xen hypervisor.
> 
>> But then, changing the 2.0.0 qemu to add this unmapping might cause other
>> unexpected problems...
> 
> I opened bug 1657489 to track fixing Trusty qemu.
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1657489
> 
> I included a matrix there of hypervisor version and guest kernel
> version, which I'll include here.
> 
> 1) qemu not patched (2.0.0 and earlier), guest kernel not patched:
> CORRECT behavior
> hypervisor: Trusty qemu or UCA Icehouse qemu
> guest: all without patch from bug 1656381
> failure: none
> 
> 2) qemu not patched (2.0.0 and earlier), guest kernel patched:
> INCORRECT behavior
> hypervisor: Trusty qemu or UCA Icehouse qemu
> guest: all with patch from bug 1656381
> failure: MSI interrupts will fail to be configured for any device, if
> the device disables and then re-enables its MSI. Only the first time a
> device enables MSI will work. For example, unloading a driver will
> result in failure to enable MSI when the driver is reloaded.
> 
> 3) qemu patched (2.1.0 and later), guest kernel not patched: INCORRECT behavior
> hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
> guest: all without patch from bug 1656381
> failure: MSI interrupts in the guest may not be correctly mapped if
> device B enables its MSI after device A has disabled its MSI; when
> device A re-enables its MSI, some of its interrupts will fail to be
> configured correctly. NVMe shows this repeatedly with multiple NVMe
> controllers; usually only 1 NVMe controller will finish initialization
> correctly.
> 
> 4) qemu patched (2.1.0 and later), guest kernel patched: CORRECT behavior
> hypervisor: Vivid or later qemu, or UCA Kilo or later qemu
> guest: all with patch from bug 1656381
> failure: none
> 
> 
>>
>> -Stefan
>>
>>>
>>> This bug is causing failures to initialize multiple NVMe controllers
>>> under Xen, because the NVMe driver sets up a single MSIX vector for
>>> each controller (concurrently), and then after using that to talk to
>>> the controller for some configuration data, it disables the single MSIX
>>> vector and re-configures all the MSIX vectors it needs.  So the MSIX
>>> setup code tries to re-use the cached pirq from the first vector
>>> for each controller, but the hypervisor has already given away that
>>> pirq to another controller, and its initialization fails.
>>>
>>> This is discussed in more detail at:
>>> https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
>>>
>>> Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests")
>>> Signed-off-by: Dan Streetman <dan.streetman at canonical.com>
>>> (backported from X/Y/Z patch for context/function name changes only)
>>>
>>> ---
>>>  arch/x86/pci/xen.c | 23 +++++++----------------
>>>  1 file changed, 7 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>>> index 5eee495..6969c76 100644
>>> --- a/arch/x86/pci/xen.c
>>> +++ b/arch/x86/pci/xen.c
>>> @@ -227,23 +227,14 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>>>               return 1;
>>>
>>>       list_for_each_entry(msidesc, &dev->msi_list, list) {
>>> -             __read_msi_msg(msidesc, &msg);
>>> -             pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>>> -                     ((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>>> -             if (msg.data != XEN_PIRQ_MSI_DATA ||
>>> -                 xen_irq_from_pirq(pirq) < 0) {
>>> -                     pirq = xen_allocate_pirq_msi(dev, msidesc);
>>> -                     if (pirq < 0) {
>>> -                             irq = -ENODEV;
>>> -                             goto error;
>>> -                     }
>>> -                     xen_msi_compose_msg(dev, pirq, &msg);
>>> -                     __write_msi_msg(msidesc, &msg);
>>> -                     dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>>> -             } else {
>>> -                     dev_dbg(&dev->dev,
>>> -                             "xen: msi already bound to pirq=%d\n", pirq);
>>> +             pirq = xen_allocate_pirq_msi(dev, msidesc);
>>> +             if (pirq < 0) {
>>> +                     irq = -ENODEV;
>>> +                     goto error;
>>>               }
>>> +             xen_msi_compose_msg(dev, pirq, &msg);
>>> +             __write_msi_msg(msidesc, &msg);
>>> +             dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
>>>               irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
>>>                                              (type == PCI_CAP_ID_MSIX) ?
>>>                                              "msi-x" : "msi",
>>>
>>
>>
>>
>> --
>> kernel-team mailing list
>> kernel-team at lists.ubuntu.com
>> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20170118/9f2b4ad2/attachment.sig>