Reducing Regression Test suite: LTP

Tue Jul 13 10:50:49 UTC 2021

On Tue, Jul 13, 2021 at 12:18:09PM +0200, Krzysztof Kozlowski wrote:
> Hi all,
> 
> We talked about possibility of reducing our regression test suite. I
> have a candidate for this - LTP (Linux Test Project).
> 
> Each run of full LTP takes around 4 hours (2 - 2.5h for ubuntu_ltp, 40
> minutes for ubuntu_ltp_stable and ~1h for ubuntu_ltp_syscalls). I looked
> at cloud instances (4 and 48 cores).
> 
> LTP tests everything: known kernel bugs and CVEs, kernel syscalls and
> user-space interfaces, network and probably more. It is a huge test suite.
> 

Which is why it is so valuable. It not only tests that kernels interfaces are
behaving as expected, but also exercises them, preventing us from missing an
important regression.

> I was looking at LTP a lot last month on cloud instances and 95% of
> failures were flaky test or instance related. The remaining 5% were
> indeed missing kernel commits for fixes but not specific to derivative,
> but to a generic kernel (e.g. missing backports for Bionic).
> 
> It is rather unlikely that LTP will pass on main kernel but will fail on
> a derivative because of the kernel issue. More likely is that the
> failure will be seen on the main kernel as well.

Our derivative kernels carry specific patches and have different
configurations. Some times, a patchset will be submitted and be tested on the
generic kernel, but not on all derivatives. I still find it is valuable that we
test derivative kernels as much as we can.

I agree that some tests are not as robust, and that means we should be
improving the tests as well, so they bring more value to our testing. But I
also thought that was why we had split the tests between ubuntu_ltp_stable and
ubuntu_ltp. That may have brought a stink to the LTP name, unfortunately. Maybe
it should have been ubuntu_ltp and ubuntu_ltp_unstable. But that is nitpicking.

If there are a flakey tests on ubuntu_ltp_stable, they should be moved to
ubuntu_ltp, and then, we can start improving the tests on ubuntu_ltp so they
can be moved to ubuntu_ltp_stable.

And LTP is rapidly changing, though they care about the tests being applicable
to older kernels and older environments. And though you think they are slow on
picking up your changes, they are fast compared to other projects.

I think it's really important that we keep testing their latest versions.

> 
> Therefore I propose to run full LTP only in some cases:
> 1. On main kernels (so mostly metal),
> 2. On HWE kernels (from which we have a derivative edge kernel but HWE
> is enough),
> 3. Development kernels (Impish) everywhere,
> 4. Maybe also OEM kernels?
> 
> In other cases run only subset of LTP. Maybe only ubuntu_ltp_syscalls?

Definitively ubuntu_ltp_syscalls. If ubuntu_ltp_stable is not stable enough, we
should prioritize fixing the tests on ltp_stable instead of the ones in
ltp_unstable. And reviewing which tests are there may be a good step too, so we
can figure out if this is testing glibc or the kernel or the hw instead.

Cascardo.

> 
> Best regards,
> Krzysztof
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team