Reducing Regression Test suite: LTP

Wed Jul 14 15:09:27 UTC 2021

On 13/07/2021 13:37, Thadeu Lima de Souza Cascardo wrote:
> On Tue, Jul 13, 2021 at 01:03:42PM +0200, Krzysztof Kozlowski wrote:
>> On 13/07/2021 12:50, Thadeu Lima de Souza Cascardo wrote:
>>> On Tue, Jul 13, 2021 at 12:18:09PM +0200, Krzysztof Kozlowski wrote:
>>>> Hi all,
>>>>
>>>> We talked about possibility of reducing our regression test suite. I
>>>> have a candidate for this - LTP (Linux Test Project).
>>>>
>>>> Each run of full LTP takes around 4 hours (2 - 2.5h for ubuntu_ltp, 40
>>>> minutes for ubuntu_ltp_stable and ~1h for ubuntu_ltp_syscalls). I looked
>>>> at cloud instances (4 and 48 cores).
>>>>
>>>> LTP tests everything: known kernel bugs and CVEs, kernel syscalls and
>>>> user-space interfaces, network and probably more. It is a huge test suite.
>>>>
>>>
>>> Which is why it is so valuable. It not only tests that kernels interfaces are
>>> behaving as expected, but also exercises them, preventing us from missing an
>>> important regression.
>>
>> Thanks for the comments. All of the regression tests are valuable. Not
>> only LTP. However with this approach we might never reduce them...
>>
> 
> I think that reasoning might work for a test for xfs_tests, for example, where
> we should not be carrying filesystems changes on our derivatives, save for
> CIFS.
> 
> Still, we have observed some cloud tests pointed out odd behavior on btrfs, due
> to a different number of CPUs.
> 
> But if we keep reducing our tests on the derivatives under that argument, we
> might end up not testing much more than boot. And amongst the tests we run, I
> find at least ubuntu_ltp_syscalls one that we should be running over all our
> kernels.

Sure, I also proposed that one for all cases.

> 
> Perhaps, I am too biased torwards ubuntu_ltp_syscalls, and have not looked much
> at the tests that we run under ubuntu_ltp_stable. And still, I am pretty sure I
> would be able to make an excuse for each of the flakey tests there. Even though
> some of them would be: the test is broken, should be fixed. ENOTENOUGHTIME.
> 
>>>
>>>> I was looking at LTP a lot last month on cloud instances and 95% of
>>>> failures were flaky test or instance related. The remaining 5% were
>>>> indeed missing kernel commits for fixes but not specific to derivative,
>>>> but to a generic kernel (e.g. missing backports for Bionic).
>>>>
>>>> It is rather unlikely that LTP will pass on main kernel but will fail on
>>>> a derivative because of the kernel issue. More likely is that the
>>>> failure will be seen on the main kernel as well.
>>>
>>> Our derivative kernels carry specific patches and have different
>>> configurations. Some times, a patchset will be submitted and be tested on the
>>> generic kernel, but not on all derivatives. I still find it is valuable that we
>>> test derivative kernels as much as we can.
>>
>> Chances that cloud derivatives will hit issue related to a separate
>> configuration or cloud-specific patch are very, very low. Of course it
>> is always possible but we are going to first paragraph - we won't be
>> able to reduce the test suite at all.
>>
> 
> Do we agree on keeping ubuntu_ltp_syscalls for the derivatives? Even that test
> has its failures once in a while, because of the fast-paced changes as I
> mentioned. But they are always keen on fixing those regressions. And we can
> carry our local fixes if they are too slow.

I agree.

Best regards,
Krzysztof