Reducing Regression Test suite: LTP

Tue Jul 13 11:37:30 UTC 2021

On Tue, Jul 13, 2021 at 01:03:42PM +0200, Krzysztof Kozlowski wrote:
> On 13/07/2021 12:50, Thadeu Lima de Souza Cascardo wrote:
> > On Tue, Jul 13, 2021 at 12:18:09PM +0200, Krzysztof Kozlowski wrote:
> >> Hi all,
> >>
> >> We talked about possibility of reducing our regression test suite. I
> >> have a candidate for this - LTP (Linux Test Project).
> >>
> >> Each run of full LTP takes around 4 hours (2 - 2.5h for ubuntu_ltp, 40
> >> minutes for ubuntu_ltp_stable and ~1h for ubuntu_ltp_syscalls). I looked
> >> at cloud instances (4 and 48 cores).
> >>
> >> LTP tests everything: known kernel bugs and CVEs, kernel syscalls and
> >> user-space interfaces, network and probably more. It is a huge test suite.
> >>
> > 
> > Which is why it is so valuable. It not only tests that kernels interfaces are
> > behaving as expected, but also exercises them, preventing us from missing an
> > important regression.
> 
> Thanks for the comments. All of the regression tests are valuable. Not
> only LTP. However with this approach we might never reduce them...
> 

I think that reasoning might work for a test for xfs_tests, for example, where
we should not be carrying filesystems changes on our derivatives, save for
CIFS.

Still, we have observed some cloud tests pointed out odd behavior on btrfs, due
to a different number of CPUs.

But if we keep reducing our tests on the derivatives under that argument, we
might end up not testing much more than boot. And amongst the tests we run, I
find at least ubuntu_ltp_syscalls one that we should be running over all our
kernels.

Perhaps, I am too biased torwards ubuntu_ltp_syscalls, and have not looked much
at the tests that we run under ubuntu_ltp_stable. And still, I am pretty sure I
would be able to make an excuse for each of the flakey tests there. Even though
some of them would be: the test is broken, should be fixed. ENOTENOUGHTIME.

> > 
> >> I was looking at LTP a lot last month on cloud instances and 95% of
> >> failures were flaky test or instance related. The remaining 5% were
> >> indeed missing kernel commits for fixes but not specific to derivative,
> >> but to a generic kernel (e.g. missing backports for Bionic).
> >>
> >> It is rather unlikely that LTP will pass on main kernel but will fail on
> >> a derivative because of the kernel issue. More likely is that the
> >> failure will be seen on the main kernel as well.
> > 
> > Our derivative kernels carry specific patches and have different
> > configurations. Some times, a patchset will be submitted and be tested on the
> > generic kernel, but not on all derivatives. I still find it is valuable that we
> > test derivative kernels as much as we can.
> 
> Chances that cloud derivatives will hit issue related to a separate
> configuration or cloud-specific patch are very, very low. Of course it
> is always possible but we are going to first paragraph - we won't be
> able to reduce the test suite at all.
> 

Do we agree on keeping ubuntu_ltp_syscalls for the derivatives? Even that test
has its failures once in a while, because of the fast-paced changes as I
mentioned. But they are always keen on fixing those regressions. And we can
carry our local fixes if they are too slow.

> > 
> > I agree that some tests are not as robust, and that means we should be
> > improving the tests as well, so they bring more value to our testing. But I
> > also thought that was why we had split the tests between ubuntu_ltp_stable and
> > ubuntu_ltp. That may have brought a stink to the LTP name, unfortunately. Maybe
> > it should have been ubuntu_ltp and ubuntu_ltp_unstable. But that is nitpicking.
> > 
> > If there are a flakey tests on ubuntu_ltp_stable, they should be moved to
> > ubuntu_ltp, and then, we can start improving the tests on ubuntu_ltp so they
> > can be moved to ubuntu_ltp_stable.
> 
> I was not biased by stable name. My observations were related to looking
> at LTP for last +1 month (https://trello.com/b/f75oUoQt/kernel-test-issues).
> 

We can schedule a session some time in the future to go over some of these, and
categorize them so we can either leave them out forever, leave them out until
fixed, skip only on some configurations, fix the kernel, etc.

> > 
> > And LTP is rapidly changing, though they care about the tests being applicable
> > to older kernels and older environments. And though you think they are slow on
> > picking up your changes, they are fast compared to other projects.
> 
> Nope, they are slow. I have 14 patches waiting without any conclusion
> for a month. There is no answer like "fix this, I don't like this". They
> just hang there waiting for something. Pinging moves them a little bit...
> 
> That's the reason we forked LTP.
> 

I have a different experience. Maybe we were just working on different areas of
the codebase or I took the low-hanging fruit, where there is less controversy
on there being a bug and the nature of the fix.

> > I think it's really important that we keep testing their latest versions.
> 
> Which we will be doing. We will be testing latest LTP. Just not
> everywhere :)
> 

I don't think that's what we meant, but we have discussed an approach where we
would be testing a stable version of LTP (as in a given stable commit, not a
set of stable tests) when doing our kernel regression testing, then test LTP
itself so we can move that stable commit to newer versions.

But we should only do that when we have enough green and enough resources to
watch for those tests and fixes. Perhaps, that strategy alone would give us
back the time we need to work on those fixes.

Cascardo.

> > 
> >>
> >> Therefore I propose to run full LTP only in some cases:
> >> 1. On main kernels (so mostly metal),
> >> 2. On HWE kernels (from which we have a derivative edge kernel but HWE
> >> is enough),
> >> 3. Development kernels (Impish) everywhere,
> >> 4. Maybe also OEM kernels?
> >>
> >> In other cases run only subset of LTP. Maybe only ubuntu_ltp_syscalls?
> > 
> > Definitively ubuntu_ltp_syscalls. If ubuntu_ltp_stable is not stable enough, we
> > should prioritize fixing the tests on ltp_stable instead of the ones in
> > ltp_unstable. And reviewing which tests are there may be a good step too, so we
> > can figure out if this is testing glibc or the kernel or the hw instead.
> 
> 
> Best regards,
> Krzysztof