[PATCH 1/1][autotest-client-tests] UBUNTU: SAUCE: ubuntu_dgx_mofed_build: create build sanity test for MOFED DKMS on DGX
Taihsiang Ho (tai271828)
taihsiang.ho at canonical.com
Fri Oct 22 14:00:00 UTC 2021
Hi Sam,
Thanks for the verification and comments. Regarding your questions:
1. Any specific reason for using different shebang /bin/sh and /usr/bin/env?
The reason is to keep the check-mofed-modules.sh "as-is" as much as
possible because it has been manually used for a while. It is stable and
reliable.
The script check-mofed-modules.sh was developed earlier and we have
manually used it for a while (several SRU cycles). When I integrated
check-mofed-modules.sh into autotest-client-tests, I prefer to keep it
"as-is" to make our automation work more stably. I chose the more portable
shebang "/usr/bin/env" for the test job wrapper
"ubuntu_dgx_mofed_build.sh", so it will be more compatible across
distribution.
2. Is the missing knem.ko expected?
Yes, it is expected. It's a known issue. knem.ko is expected to be present
if you deploy the system with "exotic-skunk". It's not present because
knem.ko dkms failed to build during the ephemeral maas environment. The
issue looks very likely raised for knem package's own buggy build script
to use the right headers to build itself. We have escalated the issue to
its upstream and we are still fixing the issue and the build script. You
may verify the issue by checking maas deployment log (by searching for
"knem") and re-install/re-build knem manually (via apt install) when the
system is deployed and gets ready to use.
Kind regards,
Tai
On Fri, Oct 22, 2021 at 12:41 PM Po-Hsu Lin <po-hsu.lin at canonical.com>
wrote:
> Hi Tai,
> please find inline comments.
>
> On Tue, Oct 19, 2021 at 8:02 PM Taihsiang Ho (tai271828)
> <taihsiang.ho at canonical.com> wrote:
> >
> > The goal of the test is to detect when a kernel SRU contains changes
> > that would cause the MOFED DKMS packages to fail to build. The test will
> > check if MOFED DKMS are built and loaded correctly on target platforms
> > currently. This test could help us to catch silent failure of DKMS build
> > which we have seen at least once.
> >
> > The MOFED versions selected are limited to the versions recommended for
> > use with DGX systems.
> >
> > The test environment is mostly prepared by MAAS via a customized curtin
> > preseed. For tasks like driver installation, software installation
> > highly associated with driver or required reboot, are setup by the
> > preseed. The rest of tasks are completed by the autotest framework, and
> > defined in the corresponding testing job.
> >
> > Signed-off-by: Taihsiang Ho (tai271828) <taihsiang.ho at canonical.com>
> > ---
> > ubuntu_dgx_mofed_build/check-mofed-modules.sh | 35 ++++++++++++++++
> > ubuntu_dgx_mofed_build/control | 12 ++++++
> > .../4.9-2.2.4.0-bionic.lst | 34 ++++++++++++++++
> > .../5.4-1.0.3.0-focal.lst | 21 ++++++++++
> > .../ubuntu_dgx_mofed_build.py | 32 +++++++++++++++
> > .../ubuntu_dgx_mofed_build.sh | 40 +++++++++++++++++++
> > 6 files changed, 174 insertions(+)
> > create mode 100755 ubuntu_dgx_mofed_build/check-mofed-modules.sh
> > create mode 100644 ubuntu_dgx_mofed_build/control
> > create mode 100644
> ubuntu_dgx_mofed_build/expected-mofed-modules/4.9-2.2.4.0-bionic.lst
> > create mode 100644
> ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
> > create mode 100644 ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.py
> > create mode 100755 ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.sh
> >
> > diff --git a/ubuntu_dgx_mofed_build/check-mofed-modules.sh
> b/ubuntu_dgx_mofed_build/check-mofed-modules.sh
> > new file mode 100755
> > index 00000000..80a29769
> > --- /dev/null
> > +++ b/ubuntu_dgx_mofed_build/check-mofed-modules.sh
> > @@ -0,0 +1,35 @@
> > +#!/bin/sh
>
> Just curious, any specific reason for using /bin/sh here and
> /usr/bin/env bash in another script?
>
> > +
> > +set -e
> > +
> > +# We store a list of expected modules for each Ubuntu release/MOFED
> version
> > +# pair. This scheme currently does not expect modules lists to differ
> between
> > +# GA and HWE kernels - which is fine for now because we are only
> testing GA
> > +# kernels.
> > +printf "INFO: Detecting Ubuntu release version..."
> > +release="$(lsb_release -cs)"
> > +printf " %s\n" $release
> > +printf "INFO: Detecting MOFED driver version..."
> > +mofedver="$(dpkg-query --showformat='${Version}' --show
> mlnx-ofed-kernel-only)"
> > +printf " %s\n" ${mofedver}
> > +printf "INFO: Detecting Kernel version... %s\n" "$(uname -r)"
> > +actual="$(mktemp)"
> > +expected="$(pwd)/expected-mofed-modules/${mofedver}-${release}.lst"
> > +
> > +# This test assumes that the only modules installed here are the MOFED
> ones.
> > +# If other DKMS packages are installed that will throw it off.
> > +echo "INFO: Scanning for available MOFED kernel modules..."
> > +ls /lib/modules/$(uname -r)/updates/dkms | sort > ${actual}
> > +
> > +if [ ! -f ${expected} ]; then
> > + echo "ERROR: No expected modules list available for MOFED $mofedver
> on $release" 1>&2
> > + exit 1
> > +fi
> > +
> > +if diff -u ${expected} ${actual}; then
> > + echo "INFO: Success: Actual module list matches expected module
> list."
> > + exit 0
> > +fi
> > +
> > +echo "ERROR: Actual modules list does not match expected modules list"
> 1>&2
> > +exit 1
> > diff --git a/ubuntu_dgx_mofed_build/control
> b/ubuntu_dgx_mofed_build/control
> > new file mode 100644
> > index 00000000..6291775b
> > --- /dev/null
> > +++ b/ubuntu_dgx_mofed_build/control
> > @@ -0,0 +1,12 @@
> > +AUTHOR = 'Taihsiang Ho <taihsiang.ho at canonical.com>'
> > +TIME = 'SHORT'
> > +NAME = 'DGX MOFED build verification test'
> > +TEST_TYPE = 'client'
> > +TEST_CLASS = 'General'
> > +TEST_CATEGORY = 'Smoke'
> > +
> > +DOC = """
> > +Perform testing of Mellanox device
> > +"""
> > +
> > +job.run_test_detail('ubuntu_dgx_mofed_build', test_name='dkms',
> tag='dkms', timeout=60)
> > diff --git
> a/ubuntu_dgx_mofed_build/expected-mofed-modules/4.9-2.2.4.0-bionic.lst
> b/ubuntu_dgx_mofed_build/expected-mofed-modules/4.9-2.2.4.0-bionic.lst
> > new file mode 100644
> > index 00000000..6b53ab1c
> > --- /dev/null
> > +++
> b/ubuntu_dgx_mofed_build/expected-mofed-modules/4.9-2.2.4.0-bionic.lst
> > @@ -0,0 +1,34 @@
> > +ib_cm.ko
> > +ib_core.ko
> > +ib_ipoib.ko
> > +ib_iser.ko
> > +ib_isert.ko
> > +ib_srp.ko
> > +ib_ucm.ko
> > +ib_umad.ko
> > +ib_uverbs.ko
> > +iw_cm.ko
> > +knem.ko
>
> I just give this a try on a DGX-1 server "exotic-skunk", this test
> will fail with on a freshly deployed Focal 5.4.0-89-generic:
> ---
> /home/ubuntu/autotest/client/tests/ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
> 2021-10-22 10:16:47.367588316 +0000
> +++ /tmp/tmp.gbcEkItzqV 2021-10-22 10:17:12.338232968 +0000
> @@ -8,7 +8,6 @@
> ib_umad.ko
> ib_uverbs.ko
> iw_cm.ko
> -knem.ko
> mlx5_core.ko
> mlx5_ib.ko
> mlx_compat.ko
>
> Is this expected?
> Thanks
> Sam
>
> > +mdev.ko
> > +mlx4_core.ko
> > +mlx4_en.ko
> > +mlx4_ib.ko
> > +mlx5_core.ko
> > +mlx5_fpga_tools.ko
> > +mlx5_ib.ko
> > +mlx_compat.ko
> > +mlxfw.ko
> > +mst_pci.ko
> > +mst_pciconf.ko
> > +rdma_cm.ko
> > +rdma_rxe.ko
> > +rdma_ucm.ko
> > +rpcrdma.ko
> > +rshim.ko
> > +rshim_net.ko
> > +rshim_pcie.ko
> > +rshim_pcie_lf.ko
> > +rshim_usb.ko
> > +scsi_transport_srp.ko
> > +svcrdma.ko
> > +xprtrdma.ko
> > diff --git
> a/ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
> b/ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
> > new file mode 100644
> > index 00000000..d56fc65e
> > --- /dev/null
> > +++ b/ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
> > @@ -0,0 +1,21 @@
> > +auxiliary.ko
> > +ib_cm.ko
> > +ib_core.ko
> > +ib_ipoib.ko
> > +ib_iser.ko
> > +ib_isert.ko
> > +ib_srp.ko
> > +ib_umad.ko
> > +ib_uverbs.ko
> > +iw_cm.ko
> > +knem.ko
> > +mlx5_core.ko
> > +mlx5_ib.ko
> > +mlx_compat.ko
> > +mlxdevm.ko
> > +mlxfw.ko
> > +mst_pci.ko
> > +mst_pciconf.ko
> > +rdma_cm.ko
> > +rdma_ucm.ko
> > +scsi_transport_srp.ko
> > diff --git a/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.py
> b/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.py
> > new file mode 100644
> > index 00000000..76c0effa
> > --- /dev/null
> > +++ b/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.py
> > @@ -0,0 +1,32 @@
> > +import os
> > +from autotest.client import test, utils
> > +
> > +p_dir = os.path.dirname(os.path.abspath(__file__))
> > +sh_executable = os.path.join(p_dir, "ubuntu_dgx_mofed_build.sh")
> > +
> > +
> > +class ubuntu_dgx_mofed_build(test.test):
> > + version = 1
> > +
> > + def initialize(self):
> > + pass
> > +
> > + def setup(self):
> > + cmd = "{} setup".format(sh_executable)
> > + utils.system(cmd)
> > +
> > + def compare_kernel_modules(self):
> > + cmd = "{} test".format(sh_executable)
> > + utils.system(cmd)
> > +
> > + def run_once(self, test_name):
> > + if test_name == "dkms":
> > + self.compare_kernel_modules()
> > +
> > + print("")
> > + print("{} has run.".format(test_name))
> > +
> > + print("")
> > +
> > + def postprocess_iteration(self):
> > + pass
> > diff --git a/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.sh
> b/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.sh
> > new file mode 100755
> > index 00000000..adf9db9c
> > --- /dev/null
> > +++ b/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.sh
> > @@ -0,0 +1,40 @@
> > +#!/usr/bin/env bash
> > +#
> > +# perform mlnx testing and corresponding pre-setup.
> > +#
> > +
> > +set -eo pipefail
> > +
> > +setup() {
> > + # pre-setup testing environment and necessary tools
> > + # currently there is nothing practically but will be used possibly
> in the future.
> > + echo "begin to pre-setup mlnx testing"
> > +}
> > +
> > +run_test() {
> > + exe_dir=$(dirname "${BASH_SOURCE[0]}")
> > + pushd ${exe_dir}
> > + ./check-mofed-modules.sh
> > + popd
> > +}
> > +
> > +case $1 in
> > + setup)
> > + echo ""
> > + echo "On setting up necessary test environment..."
> > + echo ""
> > + setup
> > + echo ""
> > + echo "Setting up necessary test environment..."
> > + echo ""
> > + ;;
> > + test)
> > + echo ""
> > + echo "On running test..."
> > + echo ""
> > + run_test
> > + echo ""
> > + echo "Running test..."
> > + echo ""
> > + ;;
> > +esac
> > --
> > 2.33.0
> >
> >
> > --
> > kernel-team mailing list
> > kernel-team at lists.ubuntu.com
> > https://lists.ubuntu.com/mailman/listinfo/kernel-team
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20211022/ffc290be/attachment-0001.html>
More information about the kernel-team
mailing list