APPLIED: [PATCH 0/1] [focal/linux-azure] Call trace on Ubuntu 18.04 VM with Standard NV24
Tim Gardner
tim.gardner at canonical.com
Wed Dec 1 15:28:15 UTC 2021
Applied to focal/linux-azure. Thanks.
-rtg
On 11/29/21 7:31 AM, Tim Gardner wrote:
> BugLink: https://bugs.launchpad.net/bugs/1952621
>
> SRU Justification
>
> [Impact]
> During large scale deployment testing, we found below call trace when provisioning
> Ubuntu 18.04 VM with size Standard_NV24. Engineer deployed instance 10 times and
> encountered once.
>
> It looks like a race condition when probe device, but finally all devices can be probed.
>
> [ 4.938162] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0003-0000-3130-444531334632/pci0003:00/0003:00:00.0/config'
> [ 4.944816] sr 5:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray
> [ 4.951818] CPU: 0 PID: 135 Comm: kworker/0:2 Not tainted 5.4.0-1061-azure #64~18.04.1-Ubuntu
> [ 4.951820] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
> [ 4.958943] cdrom: Uniform CD-ROM driver Revision: 3.20
> [ 4.955812] Workqueue: hv_pri_chan vmbus_add_channel_work
> [ 4.955812] Call Trace:
> [ 4.955812] dump_stack+0x57/0x6d
> [ 4.955812] sysfs_warn_dup+0x5b/0x70
> [ 4.955812] sysfs_add_file_mode_ns+0x158/0x180
> [ 4.955812] sysfs_create_bin_file+0x64/0x90
> [ 4.955812] pci_create_sysfs_dev_files+0x72/0x270
> [ 4.955812] pci_bus_add_device+0x30/0x80
> [ 4.955812] pci_bus_add_devices+0x31/0x70
> [ 4.955812] hv_pci_probe+0x48c/0x650
> [ 4.955812] vmbus_probe+0x3e/0x90
> [ 4.955812] really_probe+0xf5/0x440
> [ 4.955812] driver_probe_device+0x11b/0x130
> [ 4.955812] __device_attach_driver+0x7b/0xe0
> [ 4.955812] ? driver_allows_async_probing+0x60/0x60
> [ 4.955812] bus_for_each_drv+0x6e/0xb0
> [ 4.955812] __device_attach+0xe4/0x160
> [ 4.955812] device_initial_probe+0x13/0x20
> [ 4.955812] bus_probe_device+0x92/0xa0
> [ 4.955812] device_add+0x402/0x690
> [ 4.955812] device_register+0x1a/0x20
> [ 4.955812] vmbus_device_register+0x5e/0xf0
> [ 4.955812] vmbus_add_channel_work+0x2c4/0x640
> [ 4.955812] process_one_work+0x209/0x400
> [ 4.955812] worker_thread+0x34/0x400
> [ 4.955812] kthread+0x121/0x140
> [ 4.955812] ? process_one_work+0x400/0x400
> [ 4.955812] ? kthread_park+0x90/0x90
> [ 4.955812] ret_from_fork+0x35/0x40
> [ 5.043612] hv_pci 47505500-0004-0001-3130-444531334632: PCI VMBus probing: Using version 0x10002
> [ 5.260563] hv_pci 47505500-0004-0001-3130-444531334632: PCI host bridge to bus 0004:00
>
> Microsoft did some research and it looks like this is a longstanding race
> condition bug in the generic PCI subsystem (due to the timing, there can
> be more than 1 place where the PCI code tries to create the same ‘config’
> sysfs file): https://patchwork.kernel.org/project/linux-pci/patch/20200716110423.xtfyb3n6tn5ixedh@pali/#23669641
> The bug was reported on 7/16/2020, and the last reply was on 6/25/2021.
> It looks like this has not been fixed after 1+ year.
>
> [Test Case]
>
> Repeated deployment on a Standard_NV24 instance. MS reported the reproduction rate is 3/551 before the
> patch, and 0/838 with the patch.
>
> [Where things could go wrong]
>
> Deployments could fail for other reasons.
>
> [Other info]
>
> SF: #00321027
>
>
>
--
-----------
Tim Gardner
Canonical, Inc
More information about the kernel-team
mailing list