Reducing Regression Test suite: LTP

Wed Jul 14 15:26:22 UTC 2021

On 13/07/2021 13:53, Po-Hsu Lin wrote:
> On Tue, Jul 13, 2021 at 7:37 PM Thadeu Lima de Souza Cascardo
> <cascardo at canonical.com> wrote:
>>
>> On Tue, Jul 13, 2021 at 01:03:42PM +0200, Krzysztof Kozlowski wrote:
>>> On 13/07/2021 12:50, Thadeu Lima de Souza Cascardo wrote:
>>>> On Tue, Jul 13, 2021 at 12:18:09PM +0200, Krzysztof Kozlowski wrote:
>>>>> Hi all,
>>>>>
>>>>> We talked about possibility of reducing our regression test suite. I
>>>>> have a candidate for this - LTP (Linux Test Project).
>>>>>
>>>>> Each run of full LTP takes around 4 hours (2 - 2.5h for ubuntu_ltp, 40
>>>>> minutes for ubuntu_ltp_stable and ~1h for ubuntu_ltp_syscalls). I looked
>>>>> at cloud instances (4 and 48 cores).
>>>>>
>>>>> LTP tests everything: known kernel bugs and CVEs, kernel syscalls and
>>>>> user-space interfaces, network and probably more. It is a huge test suite.
>>>>>
>>>>
>>>> Which is why it is so valuable. It not only tests that kernels interfaces are
>>>> behaving as expected, but also exercises them, preventing us from missing an
>>>> important regression.
>>>
>>> Thanks for the comments. All of the regression tests are valuable. Not
>>> only LTP. However with this approach we might never reduce them...
>>>
>>
>> I think that reasoning might work for a test for xfs_tests, for example, where
>> we should not be carrying filesystems changes on our derivatives, save for
>> CIFS.
>>
>> Still, we have observed some cloud tests pointed out odd behavior on btrfs, due
>> to a different number of CPUs.
>>
>> But if we keep reducing our tests on the derivatives under that argument, we
>> might end up not testing much more than boot. And amongst the tests we run, I
>> find at least ubuntu_ltp_syscalls one that we should be running over all our
>> kernels.
>>
>> Perhaps, I am too biased torwards ubuntu_ltp_syscalls, and have not looked much
>> at the tests that we run under ubuntu_ltp_stable. And still, I am pretty sure I
>> would be able to make an excuse for each of the flakey tests there. Even though
>> some of them would be: the test is broken, should be fixed. ENOTENOUGHTIME.
>>
> If the concern here is ubunut_ltp / ubuntu_ltp_stable tests is taking
> too long to run on one instance, another solution is to break them
> down like what we did for syscalls. We can take test like controllers,
> dio which will take up to 1 hour to run into a new test suite like
> "ubuntu_ltp_controllers" in ACT.

This could help because it would allow to re-run smaller subset of tests
on some failures and see the results faster. But duration of test is not
the only problem.

For example several LTP controller tests are:
1. Outdated because they were tuned for older kernel where cgroups was
different. Only minor updates (fixups) were happening to these tests
recently - to make them working on newer kernels but no one, I think,
re-did them with new kernels. And they should be re-done because kernel
internals changed a lot since then.
For example in several places memcg tests assume hierarchical groups can
be turned on/off. Since kernel v5.11 you cannot disable hierarchical
mode. It is fixed. Tests were kind of tweaked to handle this but half of
them are now confusing or bail out early.

2. So specific or tight that they fail on any different conditions than
author's intended. I spent here few days to fixup memcg tests because
they assumed kernel memory is not accounted per group (not true since
v5.9) and any process memory allocation or subgroup management does not
create side-effects (e.g. memcg_use_hierarchy_test.sh sets limit of 1
page but on two node machine "mkdir subgroup" causes allocation of 100
pages of kmem!). See bottom of:
https://lists.linux.it/pipermail/ltp/2021-July/023803.html

The cgroups were a terribly unstable interface so maybe that is one of
the issues. But anyway LTP is expecting that kernel memory
accounting/charging will follow some imaginary rules and this is simply
wrong. How kernel accounts memory per groups is not part of API or ABI.
These are internals which can change from release to release. I fixed up
memcg tests now but they will keep failing every X kernel releases.

In the same time most of controllers interface and behavior is I think
tested by kernel selftests so duplicating these with a poorly designed
LTP controllers tests is okay if we have spare time. But we are closer
to ENOTENOUGHTIME.

Best regards,
Krzysztof