ACK: [SRU][Bionic][PATCH 1/1] x86, sched: Allow topologies where NUMA nodes share an LLC

Stefan Bader stefan.bader at canonical.com
Wed Jun 10 07:36:24 UTC 2020


On 08.06.20 06:18, Matthew Ruffell wrote:
> From: Alison Schofield <alison.schofield at intel.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1882478
> 
> Intel's Skylake Server CPUs have a different LLC topology than previous
> generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided
> into two "slices", each containing half the cores, half the LLC, and one
> memory controller and each slice is enumerated to Linux as a NUMA
> node. This is similar to how the cores and LLC were arranged for the
> Cluster-On-Die (CoD) feature.
> 
> CoD allowed the same cache line to be present in each half of the LLC.
> But, with SNC, each line is only ever present in *one* slice. This means
> that the portion of the LLC *available* to a CPU depends on the data being
> accessed:
> 
>     Remote socket: entire package LLC is shared
>     Local socket->local slice: data goes into local slice LLC
>     Local socket->remote slice: data goes into remote-slice LLC. Slightly
>                     		higher latency than local slice LLC.
> 
> The biggest implication from this is that a process accessing all
> NUMA-local memory only sees half the LLC capacity.
> 
> The CPU describes its cache hierarchy with the CPUID instruction. One of
> the CPUID leaves enumerates the "logical processors sharing this
> cache". This information is used for scheduling decisions so that tasks
> move more freely between CPUs sharing the cache.
> 
> But, the CPUID for the SNC configuration discussed above enumerates the LLC
> as being shared by the entire package. This is not 100% precise because the
> entire cache is not usable by all accesses. But, it *is* the way the
> hardware enumerates itself, and this is not likely to change.
> 
> The userspace visible impact of all the above is that the sysfs info
> reports the entire LLC as being available to the entire package. As noted
> above, this is not true for local socket accesses. This patch does not
> correct the sysfs info. It is the same, pre and post patch.
> 
> The current code emits the following warning:
> 
>  sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
> 
> The warning is coming from the topology_sane() check in smpboot.c because
> the topology is not matching the expectations of the model for obvious
> reasons.
> 
> To fix this, add a vendor and model specific check to never call
> topology_sane() for these systems. Also, just like "Cluster-on-Die" disable
> the "coregroup" sched_domain_topology_level and use NUMA information from
> the SRAT alone.
> 
> This is OK at least on the hardware we are immediately concerned about
> because the LLC sharing happens at both the slice and at the package level,
> which are also NUMA boundaries.
> 
> Signed-off-by: Alison Schofield <alison.schofield at intel.com>
> Signed-off-by: Thomas Gleixner <tglx at linutronix.de>
> Reviewed-by: Borislav Petkov <bp at suse.de>
> Cc: Prarit Bhargava <prarit at redhat.com>
> Cc: Tony Luck <tony.luck at intel.com>
> Cc: Peter Zijlstra (Intel) <peterz at infradead.org>
> Cc: brice.goglin at gmail.com
> Cc: Dave Hansen <dave.hansen at linux.intel.com>
> Cc: Borislav Petkov <bp at alien8.de>
> Cc: David Rientjes <rientjes at google.com>
> Cc: Igor Mammedov <imammedo at redhat.com>
> Cc: "H. Peter Anvin" <hpa at linux.intel.com>
> Cc: Tim Chen <tim.c.chen at linux.intel.com>
> Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com
> (backported from commit 1340ccfa9a9afefdbab90d7935d4ed19817e37c2)
> [mruffell: re-arrange #includes to match upstream]
> Signed-off-by: Matthew Ruffell <matthew.ruffell at canonical.com>
Acked-by: Stefan Bader <stefan.bader at canonical.com>
> ---
>  arch/x86/kernel/smpboot.c | 45 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ce8d67c3ed44..c4bf8e1543eb 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -78,6 +78,8 @@
>  #include <asm/realmode.h>
>  #include <asm/misc.h>
>  #include <asm/qspinlock.h>
> +#include <asm/intel-family.h>
> +#include <asm/cpu_device_id.h>
>  #include <asm/spec-ctrl.h>
>  #include <asm/hw_irq.h>
>  
> @@ -410,15 +412,47 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  	return false;
>  }
>  
> +/*
> + * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs.
> + *
> + * These are Intel CPUs that enumerate an LLC that is shared by
> + * multiple NUMA nodes. The LLC on these systems is shared for
> + * off-package data access but private to the NUMA node (half
> + * of the package) for on-package access.
> + *
> + * CPUID (the source of the information about the LLC) can only
> + * enumerate the cache as being shared *or* unshared, but not
> + * this particular configuration. The CPU in this case enumerates
> + * the cache to be shared across the entire package (spanning both
> + * NUMA nodes).
> + */
> +
> +static const struct x86_cpu_id snc_cpu[] = {
> +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_SKYLAKE_X },
> +	{}
> +};
> +
>  static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  {
>  	int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>  
> -	if (per_cpu(cpu_llc_id, cpu1) != BAD_APICID &&
> -	    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2))
> -		return topology_sane(c, o, "llc");
> +	/* Do not match if we do not have a valid APICID for cpu: */
> +	if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID)
> +		return false;
>  
> -	return false;
> +	/* Do not match if LLC id does not match: */
> +	if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2))
> +		return false;
> +
> +	/*
> +	 * Allow the SNC topology without warning. Return of false
> +	 * means 'c' does not share the LLC of 'o'. This will be
> +	 * reflected to userspace.
> +	 */
> +	if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
> +		return false;
> +
> +	return topology_sane(c, o, "llc");
>  }
>  
>  /*
> @@ -476,7 +510,8 @@ static struct sched_domain_topology_level x86_topology[] = {
>  
>  /*
>   * Set if a package/die has multiple NUMA nodes inside.
> - * AMD Magny-Cours and Intel Cluster-on-Die have this.
> + * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
> + * Sub-NUMA Clustering have this.
>   */
>  static bool x86_has_numa_in_package;
>  
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20200610/d9713f25/attachment-0001.sig>


More information about the kernel-team mailing list