ACK: [PATCH][focal:linux-azure] Drivers: hv: vmbus: Fix duplicate CPU assignments within a device

Kelsey Skunberg kelsey.skunberg at canonical.com
Thu Nov 4 17:20:37 UTC 2021


Acked-by: Kelsey Skunberg <kelsey.skunberg at canonical.com>

On 2021-11-01 08:20:08 , Tim Gardner wrote:
> From: Haiyang Zhang <haiyangz at microsoft.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1937078
> 
> commit 7c9ff3deeee61b253715dcf968a6307af148c9b2 upstream
> 
> The vmbus module uses a rotational algorithm to assign target CPUs to
> a device's channels. Depending on the timing of different device's channel
> offers, different channels of a device may be assigned to the same CPU.
> 
> For example on a VM with 2 CPUs, if NIC A and B's channels are offered
> in the following order, NIC A will have both channels on CPU0, and
> NIC B will have both channels on CPU1 -- see below. This kind of
> assignment causes RSS load that is spreading across different channels
> to end up on the same CPU.
> 
> Timing of channel offers:
> NIC A channel 0
> NIC B channel 0
> NIC A channel 1
> NIC B channel 1
> 
> VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
>         Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd}
>         Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd
>         Rel_ID=14, target_cpu=0
>         Rel_ID=17, target_cpu=0
> 
> VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
>         Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea}
>         Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea
>         Rel_ID=16, target_cpu=1
>         Rel_ID=18, target_cpu=1
> 
> Update the vmbus CPU assignment algorithm to avoid duplicate CPU
> assignments within a device.
> 
> The new algorithm iterates num_online_cpus + 1 times.
> The existing rotational algorithm to find "next NUMA & CPU" is still here.
> But if the resulting CPU is already used by the same device, it will try
> the next CPU.
> In the last iteration, it assigns the channel to the next available CPU
> like the existing algorithm. This is not normally expected, because
> during device probe, we limit the number of channels of a device to
> be <= number of online CPUs.
> 
> Signed-off-by: Haiyang Zhang <haiyangz at microsoft.com>
> Reviewed-by: Michael Kelley <mikelley at microsoft.com>
> Tested-by: Michael Kelley <mikelley at microsoft.com>
> Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com
> Signed-off-by: Wei Liu <wei.liu at kernel.org>
> (backported from commit 7c9ff3deeee61b253715dcf968a6307af148c9b2)
> Signed-off-by: Tim Gardner <tim.gardner at canonical.com>
> [ rtg - backported by Haiyang Zhang and has been
>  verified to fix the bug on A2 and D32 instance types. ]
> ---
> 
> v2 - v1 of this patch caused boot hangs on certain instance types and was dropped from
> the SRU cycle (https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1937078/comments/6).
> This backport was developed by the original author who likely has a better handle on
> things then I do. From SF, "Haiyang backported the patch to azure-5.4-next (attached). He
> also reproduced the bug on 5.4 kernel with VM size A2 and D32, and verified the patch fixes the bug."
> 
> ---
>  drivers/hv/channel_mgmt.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 2b5d20965b1bf..ceac42e48f3a2 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -628,6 +628,32 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
>  	queue_work(wq, &newchannel->add_channel_work);
>  }
>  
> +/*
> + * Check if CPU is used by other channels of the same device.
> + * It should only be called by init_vp_index().
> + */
> +static bool hv_cpuself_used(u32 cpu, struct vmbus_channel *chn)
> +{
> +	struct vmbus_channel *primary = chn->primary_channel;
> +	struct vmbus_channel *sc;
> +	bool self_used = false;
> +	unsigned long flags;
> +
> +	if (!primary)
> +		return false;
> +
> +	if (primary->target_cpu == cpu)
> +		return true;
> +
> +	spin_lock_irqsave(&primary->lock, flags);
> +	list_for_each_entry(sc, &primary->sc_list, sc_list)
> +		if (sc != chn && sc->target_cpu == cpu)
> +			self_used = true;
> +	spin_unlock_irqrestore(&primary->lock, flags);
> +
> +	return self_used;
> +}
> +
>  /*
>   * We use this state to statically distribute the channel interrupt load.
>   */
> @@ -654,6 +680,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>  	u32 cur_cpu;
>  	bool perf_chn = vmbus_devs[dev_type].perf_device;
>  	struct vmbus_channel *primary = channel->primary_channel;
> +	u32 cnt = 1, ncpu = num_online_cpus();
>  	int next_node;
>  	cpumask_var_t available_mask;
>  	struct cpumask *alloced_mask;
> @@ -676,6 +703,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>  
>  	spin_lock(&bind_channel_to_cpu_lock);
>  
> +retry:
>  	/*
>  	 * Based on the channel affinity policy, we will assign the NUMA
>  	 * nodes.
> @@ -755,6 +783,11 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
>  		}
>  	}
>  
> +	if (channel->affinity_policy == HV_BALANCED &&
> +	    channel->offermsg.offer.sub_channel_index < ncpu &&
> +	    cnt++ < ncpu + 1 && hv_cpuself_used(cur_cpu, channel))
> +		goto retry;
> +
>  	channel->target_cpu = cur_cpu;
>  	channel->target_vp = hv_cpu_number_to_vp_number(cur_cpu);
>  
> -- 
> 2.33.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list