[Bug 2131041] Re: [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)

Bryan Fraschetti 2131041 at bugs.launchpad.net
Tue Dec 16 21:57:02 UTC 2025


Note: autopkgtest failures were tempfails and are now passing

SRU verification (noble)
========================

For testing, I am using the agent "lubba" in testflinger as this machine
matches the affected architecture.

# Check installed openblas

ubuntu at lubba:~/numpy$ dpkg -l | grep libopenblas
ii  libopenblas-dev:arm64                0.3.26+ds-1                             arm64        Optimized BLAS (linear algebra) library (dev, meta)
ii  libopenblas-pthread-dev:arm64        0.3.26+ds-1                             arm64        Optimized BLAS (linear algebra) library (dev, pthread)
ii  libopenblas0:arm64                   0.3.26+ds-1                             arm64        Optimized BLAS (linear algebra) library (meta)
ii  libopenblas0-pthread:arm64           0.3.26+ds-1                             arm64        Optimized BLAS (linear algebra) library (shared lib, pthread)

# Check distro

ubuntu at lubba:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.3 LTS
Release:        24.04
Codename:       noble

# Check cpu info

ubuntu at lubba:~$ lscpu
Architecture:             aarch64
  CPU op-mode(s):         64-bit
  Byte Order:             Little Endian
CPU(s):                   72
  On-line CPU(s) list:    0-71
Vendor ID:                ARM
  Model name:             Neoverse-V2

# Install tools for Python virtual environments, and create one

ubuntu at lubba:~$ sudo apt install python3.12-venv -y
ubuntu at lubba:~$ python3 -m venv venv
ubuntu at lubba:~$ source venv/bin/activate

(venv) ubuntu at lubba:~$ python3
Python 3.12.3 (main, Nov  6 2025, 13:44:16) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> print(np.__version__)
1.26.4
>>> print(np.show_config())
...
    "blas": {
      "name": "openblas",
      "found": true,
      "version": "0.3.26",  # Linked against the system's 0.3.26
      "detection method": "pkgconfig",
      "include directory": "/usr/include/aarch64-linux-gnu/openblas-pthread/",
      "lib directory": "/usr/lib/aarch64-linux-gnu/openblas-pthread/",
      "openblas configuration": "USE_64BITINT=0 DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 NO_CBLAS= NO_LAPACK= NO_LAPACKE=1 NO_AFFINITY=1 USE_OPENMP=0 generic MAX_THREADS=64",
      "pc file directory": "/usr/lib/aarch64-linux-gnu/pkgconfig"
    },
...
>>> a = np.array([2 +3j, 3], dtype=np.complex64)
>>> b = np.array([5, 6], dtype=np.complex64)
>>> result = np.dot(a, b)
>>> print(f"np.dot(a, b) = {result}")
np.dot(a, b) = (73+15j)

Which is an incorrect calculation

# Upgrade to proposed

ubuntu at lubba:~/numpy$ dpkg -l | grep openblas
ii  libopenblas-dev:arm64                0.3.26+ds-1ubuntu0.1                    arm64        Optimized BLAS (linear algebra) library (dev, meta)
ii  libopenblas-pthread-dev:arm64        0.3.26+ds-1ubuntu0.1                    arm64        Optimized BLAS (linear algebra) library (dev, pthread)
ii  libopenblas0:arm64                   0.3.26+ds-1ubuntu0.1                    arm64        Optimized BLAS (linear algebra) library (meta)
ii  libopenblas0-pthread:arm64           0.3.26+ds-1ubuntu0.1                    arm64        Optimized BLAS (linear algebra) library (shared lib, pthread)

(venv) ubuntu at lubba:~$ python3
>>> a = np.array([2 +3j, 3], dtype=np.complex64)
>>> b = np.array([5, 6], dtype=np.complex64)
>>> result = np.dot(a, b)
>>> print(f"np.dot(a, b) = {result}")
np.dot(a, b) = (28+15j)

Which is the correct / expected result

Verification passed


** Tags removed: verification-needed verification-needed-noble
** Tags added: verification-done verification-done-noble

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/2131041

Title:
  [SRU] Incorrect Computation Result on Noble When Multiplying Complex-
  Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines
  (Neoverse V2 CPU)

Status in openblas package in Ubuntu:
  Fix Released
Status in openblas source package in Noble:
  Fix Committed

Bug description:
  [Impact]

  - When multiplying complex-valued matrices in Numpy using OpenBLAS
  compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
  optimization / computation engine on machines with Neoverse V2
  architecture (eg. Nvidia GH200 and GB200 machines) then real-valued
  component of the matrix product is not calculated correctly.

  - Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
  and GB200 machines could hit this bug. Particularly AI / ML workloads
  may be affected and this bug can affect the computational accuracy of
  their results.

  [RCA]

  - The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1,
  which auto-detects the cpu and determines the SVE kernel path at
  runtime. The GB200 and GH200 use Neoverse V2 CPUs (ARMv8) and
  unfortunately, this dynamic detection doesn't work on that CPU and the
  wrong instruction path is chosen as the existing deb doesn't have
  dynamic support for the Neoverse V2. This was fixed upstream in [1]

  - The correct hardware detection was added in 0.3.27, while Noble is
  on 0.3.26. All currently supported releases newer than Noble have
  greater versions than 0.3.27 and as a result, nothing needs to be done
  for Plucky, Questing, or Resolute

  - The issue can be worked around by setting the environment variable
  before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
  optimizations, reducing overall performance and prevents users from
  leveraging all of their hardware's features

  [Test Plan]

  To reproduce, run the following commands on Noble in a Python3.12
  environment with NumPy version: 1.26.4 installed (these are the
  defaults versions on Noble).

  a = np.array([2 +3j, 3], dtype=np.complex64)
  b = np.array([5, 6], dtype=np.complex64)
  result = np.dot(a, b)
  print(f"np.dot(a, b) = {result}")

  This produces the output:

  np.dot(a, b) = (73+15j)

  which is incorrect. The correct computation is np.dot(a, b) = (28+15j)

  With the patched OpenBLAS package installed, the correct result must
  be produced to pass verification

  [What can go wrong]

  - If the dynamic arch detection does not work, for example, if the CPU
  type cannot be determined while on a Neoverse V2 CPU machine, the
  fallback arch and SVE path would be chosen, and this bug would be hit

  - Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
  performance on the Neoverse V2 hardware may not be completely optimal,
  but at least correctness will be guaranteed and the performance will
  be better than disabling SVE altogether.

  [Extra Info]

  - Customer has confirmed that this patch produces the correct
  computation in their testing environment and passed their QA test
  suite, while the package currently in -updates is failing their QA due
  to the aforementioned computational errors

  - PPA to demonstrate build success on amd and arm is at [2]

  [1] https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
  [2] https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/2131041/+subscriptions




More information about the Ubuntu-sponsors mailing list