Cmnts: [ACT][PATCH] UBUNTU: SAUCE: ubuntu_boot: use dmesg collected by autotest

Po-Hsu Lin po-hsu.lin at canonical.com
Fri Aug 6 15:42:32 UTC 2021


On Thu, Aug 5, 2021 at 12:38 AM Sean Feole <sean.feole at canonical.com> wrote:
>
> Hey Sam,
>
> I have some thoughts on this patch, and figured I would outline them
> here.  Some points you may agree with me, some you may not, regardless i
> think it's best to bring this up.
>
> I usually am a pushover with these sort of changes, however, I don't
> like complex rules in tests that now gate the progression of a thing.
> In the case of changes to the ubuntu_boot test.
>
> Any patch/update to this test is now affecting the progression of ALL
> kernels on 5 arch types. It may not seem like a big deal , but it is.
>
> 1.)  This patch proposes to change the log scanning of the ubuntu_boot
> test from /var/log/syslog to dmesg.  I don't see a problem with that,
> unless others with more experience in the kernel can comment here.
>
Hi Sean,
thanks for the throughout write-up. This test is indeed very important
to us as it's our baseline test.

I am not 100% confident with these below, so please feel free to
correct me if I take them wrong.

>   1a. )Will the BUG:/Ooops:/kernel:/WARNING: messages all appear in
> dmesg in addition to syslog?
>
To my understanding these log message will be stored in dmesg (kernel
ring buffer) first, then it will be write to syslog and kern.log in
/var/log/ via syslogd and klogd.
If anything calls printk(), it will be logged. So yeah, if we can see
an error or warning message in syslog, it will go through dmesg first.

>   1b.) Will this work on all archs?
>
> I myself need to know that will still occur and on all the arch types,
> because we do not want those messages to slip through testing.
And according to include/asm-generic/bug.h, calling BUG_ON / WARN_ON
will both trigger printk(), so I think it works for all arches (we do
have arch-specific bug handling code like arch/arm/include/asm/bug.h,
but it will include this bug.h from asm-generic)

If in doubt, I think we can give it a try on those node that failed
with this log_check tests here:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bugs?field.tag=ubuntu-boot
To see if we can get the same result (except the s390x one as it's
getting error message from older sessions).

>
>   1b.) My concern is that most of CKCT uses syslog. so should be not
> mirror the change there?
I tried git grep syslog in CKCT but the only thing I can see here is:
health-report/obruchev-check.yaml

So I am not sure what is the use you mentioned here.

>
>   1c.) Should we be scanning both? syslog & dmesg, in the event messages
> appear in one log versus another?
It just came up to me that there is a size limit for the dmesg ring
buffer, which is defined by CONFIG_LOG_BUF_SHIFT. Take my laptop for
example this value is 18, represents 256Kb

I think maybe this is enough to capture boot dmesg on a freshly
rebooted system. A possible exception is that the system dmesg gets
flooded (in a very short period, after reboot to test starts) by some
buggy stuff and the message we want got flushed into syslog.

To solve this we can check for timestamp 0.000, something like:
[    0.000000] Linux version
To make sure this dmesg output is complete. If not, fallback to use syslog.

>
>
>
> 2.) I personally don't want to add customized rules in ubuntu_boot, for
> the sake of handling a corner case.  What is the corner case?   A
> manually provisioned system.
>
> Any special handling should be done within CKCT, the ubuntu_test test
> should be as simple as possible, clear and concise. It should not need
> to know where it's being installed, if the installed system has been
> running for 50 of days or 5 hours.  It's purpose is to scan the logs and
> catch for any warnings mentioned in point #1a.
>
> Everyone should keep the mindset that all of the tests are intended to
> run on a fresh clean system, In my opinion, the tests should be
> completely agnostic to that fact.
>
> The rest is pretty much a rant and not pertinent to the patch itself,
> but i thought it best since on this topic to list them here.
>
There is no customized rules here for manually provisioned system.
It's the fact that the original design of this test (reading from
syslog) it not general enough to cover this. And this change can
co-exist with your s390x_cleanup patch for CKCT without any problem.

> 3.) I personally don't think we should expand (customized scripts) such
> as the kernel_taint script from checkbox, to the ubuntu_boot test unless
> the test has been thoroughly tested on all -main kernels of all
> archs/series. That is not to much to ask, the assumption is the
> derivatives will also work properly if in fact the -main kernels pass.
> Not to mention it's just another thing to add onto the endless pile of
> work to monitor if/when the kernel_taint script gets updated.  We have
> already seen cases where the kernel_taint script fails on s390x, which
> means, it should be fixed, NOT HINTED. (That is my opinion only)  =)

Speaking of this, there are two fixes that we might want to import later:
1. bin/kernel_taint_test.py will not fail on ZFS related modules,
which we expect to be there lp: #1908129
https://git.launchpad.net/plainbox-provider-checkbox/commit/bin/kernel_taint_test.py?id=918b6050dbe0e0a148713a08919a26fc4e7f3b05
2. bin/kernel_taint_test.py: return 0 for passed cases, 1 for others
https://git.launchpad.net/plainbox-provider-checkbox/commit/bin/kernel_taint_test.py?id=11f1532e586f3014c6476c4b4f188bda61cf9a95

The first one can be discussed. As I have found the cause is the
leftover zfsutils-linux package installed by zfs-related tests.
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892124

We can either add cleanup() to those tests, or remove that package in
manual_cleanup() in CKCT. The latter is easier.

>
> 4.) ubuntu_boot should not be hinted at all. You're gating the kernels
> on this. If there is a problem with ubuntu_boot, it *should* to be fixed.
I have no preference here. But if you want me to do some nitpicking on
hinting I would say no bug should be hinted unless we can confirm the
error message is the same, otherwise just checking the test name can
be deceiving as it can fail in sooo many ways.

>
> 5.) We can probably get rid of anything related to CENTOS, The kernel
> teams branch of Autotest and The Kernel teams branch of
> Autotest-Client-Test is all customized for Debian. Anything related to
> centos does not work. This is the reality of it. And that's really bad.
Just curious, why this is really bad? Because we can't test CentOS?
Do we need to support testing on CentOS?

Thanks
Sam

>
> -Sean
>
> On 8/3/21 7:30 AM, Po-Hsu Lin wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1937276
> >
> > Checking error from syslog works for freshly provisioned systems, but
> > with the manually provisioned systems since the log is not guaranteed
> > to be the boot log for the current session, it can be contaminated by
> > other tests and trigger false-positives.
> >
> > Use dmesg collected by the autotest framework for this instead.
> >
> > Signed-off-by: Po-Hsu Lin <po-hsu.lin at canonical.com>
> > ---
> >   ubuntu_boot/ubuntu_boot.py | 21 +++++++++++++++------
> >   1 file changed, 15 insertions(+), 6 deletions(-)
> >
> > diff --git a/ubuntu_boot/ubuntu_boot.py b/ubuntu_boot/ubuntu_boot.py
> > index 8782818f..7d7799b2 100644
> > --- a/ubuntu_boot/ubuntu_boot.py
> > +++ b/ubuntu_boot/ubuntu_boot.py
> > @@ -14,15 +14,24 @@ class ubuntu_boot(test.test):
> >
> >       def log_check(self):
> >           '''Test for checking error patterns in log files'''
> > -        '''Centos Specific Boot Test Checks'''
> > -        os_dist = platform.linux_distribution()[0].split(' ')[0]
> > +        '''Please run this on a freshly rebooted / provisioned system'''
> >
> > -        # dmesg will be cleared out in autotest with dmesg -c before the test starts
> > -        # Let's check for /var/log/syslog instead
> > -        if os_dist == 'CentOS':
> > -            logfile = '/var/log/messages'
> > +        # dmesg before the test will be compressed and cleared with dmesg -c
> > +        # the log will be stored in autotest/client/results/default/sysinfo/dmesg.gz
> > +        dmesg_gz = os.path.join(os.environ['AUTODIR'], 'results/default/sysinfo/dmesg.gz')
> > +        if os.path.exists(dmesg_gz):
> > +            logfile = '/tmp/dmesg-ubuntu-boot'
> > +            cmd = 'gzip -dk {} -c > {}'.format(dmesg_gz, logfile)
> > +            utils.system(cmd, ignore_status=True)
> >           else:
> > +            # Fallback to syslog, which works for newly deployed node but not ideal for
> > +            # manually provisioned SUTs as the content is not just for the current session
> >               logfile = '/var/log/syslog'
> > +            # Centos Specific Boot Test Checks
> > +            os_dist = platform.linux_distribution()[0].split(' ')[0]
> > +            if os_dist == 'CentOS':
> > +                logfile = '/var/log/messages'
> > +
> >           patterns = [
> >               'kernel: \[ *\d+\.\d+\] BUG:.*',
> >               'kernel: \[ *\d+\.\d+\] Oops:.*',
>



More information about the kernel-team mailing list