[PATCH 0/4][I/linux-azure H/linux-azure F/linuz-azure-5.4 B/linux-azure-4.15] add Icelake servers support in no-HWP mode to cpufreq/intel_pstate driver

Bartlomiej Zolnierkiewicz bartlomiej.zolnierkiewicz at canonical.com
Fri Nov 26 16:13:17 UTC 2021


BugLink: https://bugs.launchpad.net/bugs/1952234

SRU Justification

[Impact]

Starting with the AH2020 Azure host build, Hyper-V is virtualizing some registers that provide information about the CPU frequency. The registers are read-only in a guest VM, so the guest can see the frequency, but cannot make any modifications.

This feature also requires that the VM Configuration Version be 9.2 or later, which means it needs to be a new VM type, such as the just introduced Dv5/Ev5 series, and the new M832v2 VMs.

Within the Linux VM, the presence of the feature is indicated by the “aperfmperf” flag in the “lscpu” flags output (or in the flags field in /proc/cpuinfo).

It turns out there is a Linux kernel limitation when running on the new Intel IceLake processors used for the Dv5/Ev5 series. Upstream commit fbdc21e9b038 was added to provide IceLake support in the 5.14 kernel.

Microsoft has asked to backport fbdc21e9b038 commit to all supported kernels.

[Test Plan]

Run Intel IceLake based VM and check the "aperfmperf" flag in the "lscpu" flags output.

Without the patch the intel_pstate directory is missing from /sys/devices/system/cpu/ and /sys/devices/system/cpu/cpufreq/ is empty.

[Where problems could occur]

* intel_pstate driver is always used on Intel IceLake based VMs without checking for presence of "aperfmperf" CPU flag.

* In earlier (5.4 and 4.15) linux-azure kernels when intel_pstate driver is used it is in "active" mode instead of "passive" one (as reported by "cat /sys/devices/system/cpu/intel_pstate/status", also "cat /sys/devices/system/cpu/cpufreq/policy0/scaling_driver" returns "intel_pstate" instead of "intel_cpufreq" which is the expected behavior when in "active" mode).

If a consistent behavior across all kernel versions is desired commit 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP") from the upstream should probably also be backported.

* /sys/devices/system/cpu/cpufreq/policy*/scaling_{min,max}_freq files can be modified and the values reported by kernel will no longer match the values used by hardware.

[Other Info]

None.



More information about the kernel-team mailing list