[Bug 2131041] Re: [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)
Bryan Fraschetti
2131041 at bugs.launchpad.net
Tue Dec 16 21:57:02 UTC 2025
Note: autopkgtest failures were tempfails and are now passing
SRU verification (noble)
========================
For testing, I am using the agent "lubba" in testflinger as this machine
matches the affected architecture.
# Check installed openblas
ubuntu at lubba:~/numpy$ dpkg -l | grep libopenblas
ii libopenblas-dev:arm64 0.3.26+ds-1 arm64 Optimized BLAS (linear algebra) library (dev, meta)
ii libopenblas-pthread-dev:arm64 0.3.26+ds-1 arm64 Optimized BLAS (linear algebra) library (dev, pthread)
ii libopenblas0:arm64 0.3.26+ds-1 arm64 Optimized BLAS (linear algebra) library (meta)
ii libopenblas0-pthread:arm64 0.3.26+ds-1 arm64 Optimized BLAS (linear algebra) library (shared lib, pthread)
# Check distro
ubuntu at lubba:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.3 LTS
Release: 24.04
Codename: noble
# Check cpu info
ubuntu at lubba:~$ lscpu
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Vendor ID: ARM
Model name: Neoverse-V2
# Install tools for Python virtual environments, and create one
ubuntu at lubba:~$ sudo apt install python3.12-venv -y
ubuntu at lubba:~$ python3 -m venv venv
ubuntu at lubba:~$ source venv/bin/activate
(venv) ubuntu at lubba:~$ python3
Python 3.12.3 (main, Nov 6 2025, 13:44:16) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> print(np.__version__)
1.26.4
>>> print(np.show_config())
...
"blas": {
"name": "openblas",
"found": true,
"version": "0.3.26", # Linked against the system's 0.3.26
"detection method": "pkgconfig",
"include directory": "/usr/include/aarch64-linux-gnu/openblas-pthread/",
"lib directory": "/usr/lib/aarch64-linux-gnu/openblas-pthread/",
"openblas configuration": "USE_64BITINT=0 DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 NO_CBLAS= NO_LAPACK= NO_LAPACKE=1 NO_AFFINITY=1 USE_OPENMP=0 generic MAX_THREADS=64",
"pc file directory": "/usr/lib/aarch64-linux-gnu/pkgconfig"
},
...
>>> a = np.array([2 +3j, 3], dtype=np.complex64)
>>> b = np.array([5, 6], dtype=np.complex64)
>>> result = np.dot(a, b)
>>> print(f"np.dot(a, b) = {result}")
np.dot(a, b) = (73+15j)
Which is an incorrect calculation
# Upgrade to proposed
ubuntu at lubba:~/numpy$ dpkg -l | grep openblas
ii libopenblas-dev:arm64 0.3.26+ds-1ubuntu0.1 arm64 Optimized BLAS (linear algebra) library (dev, meta)
ii libopenblas-pthread-dev:arm64 0.3.26+ds-1ubuntu0.1 arm64 Optimized BLAS (linear algebra) library (dev, pthread)
ii libopenblas0:arm64 0.3.26+ds-1ubuntu0.1 arm64 Optimized BLAS (linear algebra) library (meta)
ii libopenblas0-pthread:arm64 0.3.26+ds-1ubuntu0.1 arm64 Optimized BLAS (linear algebra) library (shared lib, pthread)
(venv) ubuntu at lubba:~$ python3
>>> a = np.array([2 +3j, 3], dtype=np.complex64)
>>> b = np.array([5, 6], dtype=np.complex64)
>>> result = np.dot(a, b)
>>> print(f"np.dot(a, b) = {result}")
np.dot(a, b) = (28+15j)
Which is the correct / expected result
Verification passed
** Tags removed: verification-needed verification-needed-noble
** Tags added: verification-done verification-done-noble
--
You received this bug notification because you are a member of Ubuntu
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/2131041
Title:
[SRU] Incorrect Computation Result on Noble When Multiplying Complex-
Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines
(Neoverse V2 CPU)
Status in openblas package in Ubuntu:
Fix Released
Status in openblas source package in Noble:
Fix Committed
Bug description:
[Impact]
- When multiplying complex-valued matrices in Numpy using OpenBLAS
compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
optimization / computation engine on machines with Neoverse V2
architecture (eg. Nvidia GH200 and GB200 machines) then real-valued
component of the matrix product is not calculated correctly.
- Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
and GB200 machines could hit this bug. Particularly AI / ML workloads
may be affected and this bug can affect the computational accuracy of
their results.
[RCA]
- The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1,
which auto-detects the cpu and determines the SVE kernel path at
runtime. The GB200 and GH200 use Neoverse V2 CPUs (ARMv8) and
unfortunately, this dynamic detection doesn't work on that CPU and the
wrong instruction path is chosen as the existing deb doesn't have
dynamic support for the Neoverse V2. This was fixed upstream in [1]
- The correct hardware detection was added in 0.3.27, while Noble is
on 0.3.26. All currently supported releases newer than Noble have
greater versions than 0.3.27 and as a result, nothing needs to be done
for Plucky, Questing, or Resolute
- The issue can be worked around by setting the environment variable
before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
optimizations, reducing overall performance and prevents users from
leveraging all of their hardware's features
[Test Plan]
To reproduce, run the following commands on Noble in a Python3.12
environment with NumPy version: 1.26.4 installed (these are the
defaults versions on Noble).
a = np.array([2 +3j, 3], dtype=np.complex64)
b = np.array([5, 6], dtype=np.complex64)
result = np.dot(a, b)
print(f"np.dot(a, b) = {result}")
This produces the output:
np.dot(a, b) = (73+15j)
which is incorrect. The correct computation is np.dot(a, b) = (28+15j)
With the patched OpenBLAS package installed, the correct result must
be produced to pass verification
[What can go wrong]
- If the dynamic arch detection does not work, for example, if the CPU
type cannot be determined while on a Neoverse V2 CPU machine, the
fallback arch and SVE path would be chosen, and this bug would be hit
- Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
performance on the Neoverse V2 hardware may not be completely optimal,
but at least correctness will be guaranteed and the performance will
be better than disabling SVE altogether.
[Extra Info]
- Customer has confirmed that this patch produces the correct
computation in their testing environment and passed their QA test
suite, while the package currently in -updates is failing their QA due
to the aforementioned computational errors
- PPA to demonstrate build success on amd and arm is at [2]
[1] https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
[2] https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/2131041/+subscriptions
More information about the Ubuntu-sponsors
mailing list