[PATCH 0/1] [focal/linux-azure] Call trace on Ubuntu 18.04 VM with Standard NV24

Tim Gardner tim.gardner at canonical.com
Mon Nov 29 14:31:37 UTC 2021


BugLink: https://bugs.launchpad.net/bugs/1952621

SRU Justification

[Impact]
During large scale deployment testing, we found below call trace when provisioning
Ubuntu 18.04 VM with size Standard_NV24. Engineer deployed instance 10 times and
encountered once.

It looks like a race condition when probe device, but finally all devices can be probed.

[ 4.938162] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0003-0000-3130-444531334632/pci0003:00/0003:00:00.0/config'
[ 4.944816] sr 5:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray
[ 4.951818] CPU: 0 PID: 135 Comm: kworker/0:2 Not tainted 5.4.0-1061-azure #64~18.04.1-Ubuntu
[ 4.951820] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 4.958943] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 4.955812] Workqueue: hv_pri_chan vmbus_add_channel_work
[ 4.955812] Call Trace:
[ 4.955812] dump_stack+0x57/0x6d
[ 4.955812] sysfs_warn_dup+0x5b/0x70
[ 4.955812] sysfs_add_file_mode_ns+0x158/0x180
[ 4.955812] sysfs_create_bin_file+0x64/0x90
[ 4.955812] pci_create_sysfs_dev_files+0x72/0x270
[ 4.955812] pci_bus_add_device+0x30/0x80
[ 4.955812] pci_bus_add_devices+0x31/0x70
[ 4.955812] hv_pci_probe+0x48c/0x650
[ 4.955812] vmbus_probe+0x3e/0x90
[ 4.955812] really_probe+0xf5/0x440
[ 4.955812] driver_probe_device+0x11b/0x130
[ 4.955812] __device_attach_driver+0x7b/0xe0
[ 4.955812] ? driver_allows_async_probing+0x60/0x60
[ 4.955812] bus_for_each_drv+0x6e/0xb0
[ 4.955812] __device_attach+0xe4/0x160
[ 4.955812] device_initial_probe+0x13/0x20
[ 4.955812] bus_probe_device+0x92/0xa0
[ 4.955812] device_add+0x402/0x690
[ 4.955812] device_register+0x1a/0x20
[ 4.955812] vmbus_device_register+0x5e/0xf0
[ 4.955812] vmbus_add_channel_work+0x2c4/0x640
[ 4.955812] process_one_work+0x209/0x400
[ 4.955812] worker_thread+0x34/0x400
[ 4.955812] kthread+0x121/0x140
[ 4.955812] ? process_one_work+0x400/0x400
[ 4.955812] ? kthread_park+0x90/0x90
[ 4.955812] ret_from_fork+0x35/0x40
[ 5.043612] hv_pci 47505500-0004-0001-3130-444531334632: PCI VMBus probing: Using version 0x10002
[ 5.260563] hv_pci 47505500-0004-0001-3130-444531334632: PCI host bridge to bus 0004:00

Microsoft did some research and it looks like this is a longstanding race
condition bug in the generic PCI subsystem (due to the timing, there can
be more than 1 place where the PCI code tries to create the same ‘config’
sysfs file): https://patchwork.kernel.org/project/linux-pci/patch/20200716110423.xtfyb3n6tn5ixedh@pali/#23669641
The bug was reported on 7/16/2020, and the last reply was on 6/25/2021.
It looks like this has not been fixed after 1+ year.

[Test Case]

Repeated deployment on a Standard_NV24 instance. MS reported the reproduction rate is 3/551 before the
patch, and 0/838 with the patch.

[Where things could go wrong]

Deployments could fail for other reasons.

[Other info]

SF: #00321027






More information about the kernel-team mailing list