[PATCH 1/1][autotest-client-tests] UBUNTU: SAUCE: ubuntu_dgx_mofed_build: create build sanity test for MOFED DKMS on DGX

Po-Hsu Lin po-hsu.lin at canonical.com
Fri Oct 22 10:42:05 UTC 2021


Hi Tai,
please find inline comments.

On Tue, Oct 19, 2021 at 8:02 PM Taihsiang Ho (tai271828)
<taihsiang.ho at canonical.com> wrote:
>
> The goal of the test is to detect when a kernel SRU contains changes
> that would cause the MOFED DKMS packages to fail to build. The test will
> check if MOFED DKMS are built and loaded correctly on target platforms
> currently. This test could help us to catch silent failure of DKMS build
> which we have seen at least once.
>
> The MOFED versions selected are limited to the versions recommended for
> use with DGX systems.
>
> The test environment is mostly prepared by MAAS via a customized curtin
> preseed. For tasks like driver installation, software installation
> highly associated with driver or required reboot, are setup by the
> preseed. The rest of tasks are completed by the autotest framework, and
> defined in the corresponding testing job.
>
> Signed-off-by: Taihsiang Ho (tai271828) <taihsiang.ho at canonical.com>
> ---
>  ubuntu_dgx_mofed_build/check-mofed-modules.sh | 35 ++++++++++++++++
>  ubuntu_dgx_mofed_build/control                | 12 ++++++
>  .../4.9-2.2.4.0-bionic.lst                    | 34 ++++++++++++++++
>  .../5.4-1.0.3.0-focal.lst                     | 21 ++++++++++
>  .../ubuntu_dgx_mofed_build.py                 | 32 +++++++++++++++
>  .../ubuntu_dgx_mofed_build.sh                 | 40 +++++++++++++++++++
>  6 files changed, 174 insertions(+)
>  create mode 100755 ubuntu_dgx_mofed_build/check-mofed-modules.sh
>  create mode 100644 ubuntu_dgx_mofed_build/control
>  create mode 100644 ubuntu_dgx_mofed_build/expected-mofed-modules/4.9-2.2.4.0-bionic.lst
>  create mode 100644 ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
>  create mode 100644 ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.py
>  create mode 100755 ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.sh
>
> diff --git a/ubuntu_dgx_mofed_build/check-mofed-modules.sh b/ubuntu_dgx_mofed_build/check-mofed-modules.sh
> new file mode 100755
> index 00000000..80a29769
> --- /dev/null
> +++ b/ubuntu_dgx_mofed_build/check-mofed-modules.sh
> @@ -0,0 +1,35 @@
> +#!/bin/sh

Just curious, any specific reason for using /bin/sh here and
/usr/bin/env bash in another script?

> +
> +set -e
> +
> +# We store a list of expected modules for each Ubuntu release/MOFED version
> +# pair. This scheme currently does not expect modules lists to differ between
> +# GA and HWE kernels - which is fine for now because we are only testing GA
> +# kernels.
> +printf "INFO: Detecting Ubuntu release version..."
> +release="$(lsb_release -cs)"
> +printf " %s\n" $release
> +printf "INFO: Detecting MOFED driver version..."
> +mofedver="$(dpkg-query --showformat='${Version}' --show mlnx-ofed-kernel-only)"
> +printf " %s\n" ${mofedver}
> +printf "INFO: Detecting Kernel version... %s\n" "$(uname -r)"
> +actual="$(mktemp)"
> +expected="$(pwd)/expected-mofed-modules/${mofedver}-${release}.lst"
> +
> +# This test assumes that the only modules installed here are the MOFED ones.
> +# If other DKMS packages are installed that will throw it off.
> +echo "INFO: Scanning for available MOFED kernel modules..."
> +ls /lib/modules/$(uname -r)/updates/dkms | sort > ${actual}
> +
> +if [ ! -f ${expected} ]; then
> +    echo "ERROR: No expected modules list available for MOFED $mofedver on $release" 1>&2
> +    exit 1
> +fi
> +
> +if diff -u ${expected} ${actual}; then
> +    echo "INFO: Success: Actual module list matches expected module list."
> +    exit 0
> +fi
> +
> +echo "ERROR: Actual modules list does not match expected modules list" 1>&2
> +exit 1
> diff --git a/ubuntu_dgx_mofed_build/control b/ubuntu_dgx_mofed_build/control
> new file mode 100644
> index 00000000..6291775b
> --- /dev/null
> +++ b/ubuntu_dgx_mofed_build/control
> @@ -0,0 +1,12 @@
> +AUTHOR = 'Taihsiang Ho <taihsiang.ho at canonical.com>'
> +TIME = 'SHORT'
> +NAME = 'DGX MOFED build verification test'
> +TEST_TYPE = 'client'
> +TEST_CLASS = 'General'
> +TEST_CATEGORY = 'Smoke'
> +
> +DOC = """
> +Perform testing of Mellanox device
> +"""
> +
> +job.run_test_detail('ubuntu_dgx_mofed_build', test_name='dkms', tag='dkms', timeout=60)
> diff --git a/ubuntu_dgx_mofed_build/expected-mofed-modules/4.9-2.2.4.0-bionic.lst b/ubuntu_dgx_mofed_build/expected-mofed-modules/4.9-2.2.4.0-bionic.lst
> new file mode 100644
> index 00000000..6b53ab1c
> --- /dev/null
> +++ b/ubuntu_dgx_mofed_build/expected-mofed-modules/4.9-2.2.4.0-bionic.lst
> @@ -0,0 +1,34 @@
> +ib_cm.ko
> +ib_core.ko
> +ib_ipoib.ko
> +ib_iser.ko
> +ib_isert.ko
> +ib_srp.ko
> +ib_ucm.ko
> +ib_umad.ko
> +ib_uverbs.ko
> +iw_cm.ko
> +knem.ko

I just give this a try on a DGX-1 server "exotic-skunk", this test
will fail with on a freshly deployed Focal 5.4.0-89-generic:
  --- /home/ubuntu/autotest/client/tests/ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
   2021-10-22 10:16:47.367588316 +0000
  +++ /tmp/tmp.gbcEkItzqV    2021-10-22 10:17:12.338232968 +0000
  @@ -8,7 +8,6 @@
   ib_umad.ko
   ib_uverbs.ko
   iw_cm.ko
  -knem.ko
   mlx5_core.ko
   mlx5_ib.ko
   mlx_compat.ko

Is this expected?
Thanks
Sam

> +mdev.ko
> +mlx4_core.ko
> +mlx4_en.ko
> +mlx4_ib.ko
> +mlx5_core.ko
> +mlx5_fpga_tools.ko
> +mlx5_ib.ko
> +mlx_compat.ko
> +mlxfw.ko
> +mst_pci.ko
> +mst_pciconf.ko
> +rdma_cm.ko
> +rdma_rxe.ko
> +rdma_ucm.ko
> +rpcrdma.ko
> +rshim.ko
> +rshim_net.ko
> +rshim_pcie.ko
> +rshim_pcie_lf.ko
> +rshim_usb.ko
> +scsi_transport_srp.ko
> +svcrdma.ko
> +xprtrdma.ko
> diff --git a/ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst b/ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
> new file mode 100644
> index 00000000..d56fc65e
> --- /dev/null
> +++ b/ubuntu_dgx_mofed_build/expected-mofed-modules/5.4-1.0.3.0-focal.lst
> @@ -0,0 +1,21 @@
> +auxiliary.ko
> +ib_cm.ko
> +ib_core.ko
> +ib_ipoib.ko
> +ib_iser.ko
> +ib_isert.ko
> +ib_srp.ko
> +ib_umad.ko
> +ib_uverbs.ko
> +iw_cm.ko
> +knem.ko
> +mlx5_core.ko
> +mlx5_ib.ko
> +mlx_compat.ko
> +mlxdevm.ko
> +mlxfw.ko
> +mst_pci.ko
> +mst_pciconf.ko
> +rdma_cm.ko
> +rdma_ucm.ko
> +scsi_transport_srp.ko
> diff --git a/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.py b/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.py
> new file mode 100644
> index 00000000..76c0effa
> --- /dev/null
> +++ b/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.py
> @@ -0,0 +1,32 @@
> +import os
> +from autotest.client import test, utils
> +
> +p_dir = os.path.dirname(os.path.abspath(__file__))
> +sh_executable = os.path.join(p_dir, "ubuntu_dgx_mofed_build.sh")
> +
> +
> +class ubuntu_dgx_mofed_build(test.test):
> +    version = 1
> +
> +    def initialize(self):
> +        pass
> +
> +    def setup(self):
> +        cmd = "{} setup".format(sh_executable)
> +        utils.system(cmd)
> +
> +    def compare_kernel_modules(self):
> +        cmd = "{} test".format(sh_executable)
> +        utils.system(cmd)
> +
> +    def run_once(self, test_name):
> +        if test_name == "dkms":
> +            self.compare_kernel_modules()
> +
> +            print("")
> +            print("{} has run.".format(test_name))
> +
> +        print("")
> +
> +    def postprocess_iteration(self):
> +        pass
> diff --git a/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.sh b/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.sh
> new file mode 100755
> index 00000000..adf9db9c
> --- /dev/null
> +++ b/ubuntu_dgx_mofed_build/ubuntu_dgx_mofed_build.sh
> @@ -0,0 +1,40 @@
> +#!/usr/bin/env bash
> +#
> +# perform mlnx testing and corresponding pre-setup.
> +#
> +
> +set -eo pipefail
> +
> +setup() {
> +    # pre-setup testing environment and necessary tools
> +    # currently there is nothing practically but will be used possibly in the future.
> +    echo "begin to pre-setup mlnx testing"
> +}
> +
> +run_test() {
> +    exe_dir=$(dirname "${BASH_SOURCE[0]}")
> +    pushd ${exe_dir}
> +    ./check-mofed-modules.sh
> +    popd
> +}
> +
> +case $1 in
> +    setup)
> +        echo ""
> +        echo "On setting up necessary test environment..."
> +        echo ""
> +        setup
> +        echo ""
> +        echo "Setting up necessary test environment..."
> +        echo ""
> +        ;;
> +    test)
> +        echo ""
> +        echo "On running test..."
> +        echo ""
> +        run_test
> +        echo ""
> +        echo "Running test..."
> +        echo ""
> +        ;;
> +esac
> --
> 2.33.0
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list