ACK/Cmt: [PATCH 0/2][autotest-client-tests] ubuntu_performance_fio: stabilize DGX2 FIO tests
Colin Ian King
colin.king at canonical.com
Fri Dec 4 16:19:54 UTC 2020
On 04/12/2020 15:59, Ian May wrote:
> DGX2 Performance testing is currently producing unstable results with FIO tests. Through trial
> and error and suggestions from Nvidia's DGX performance team. We found a set of tests
> that help stabalize the FIO numbers. Since the goal of the perf tests is for identifying
> regressions, I propose we make the necessary adjustments for future DGX2 FIO perf tests.
>
> Since the FIO tests have their test description in the file name this involves adding a new
> file for each test. The new tests will use only two combinations of blk size and jobs
> blk-128k, jobs-16
> blk-8k, jobs-64
> The ioengine has been changed from libaio to sync, this simplified things in regards to not
> having to isolate a stable iodepth and also mirrors Nvidia's tests. Another signicant
> change is increasing the file size throughput for each test. Currently we are using 2G for
> each test. By changing the file size to 32G and 8G with the respective blk size and jobs
> mentioned above, we see all tests complete within a 5% margin of error we were targeting.
>
> The second patch changes the file size handling for the tests. Currently we use a globally
> defined file size and swap that into each test as we run. Since this new model uses multiple
> file sizes, the size is defined in the test itself and we pull the value out of the test for
> display output purposes. There is currently a check against the file size before we run any
> of the FIO tests. For simplicity and ease of changes, I propose setting the global file size
> to the size of the largest test size(32G), so the initial check against available system memory will
> still be a valid check. This seemed safer than trying to move the check to in between tests
> and therefore having to introduce additional error handling.
>
> Ian May (2):
> UBUNTU: SAUCE: ubuntu_performance_fio: Add new FIO tests and remove
> old
> UBUNTU: SAUCE: ubuntu_performance_fio: Change value of FIO global
> file_size_mb
>
> ubuntu_performance_fio/control | 29 +++++++------------
> ... => rd-0,wr-100,rand,blk-128k,jobs-16.fio} | 13 ++++-----
> ...io => rd-0,wr-100,rand,blk-8k,jobs-64.fio} | 13 ++++-----
> ...0,wr-100,rand,blk-8k,jobs-8,iodepth-32.fio | 21 --------------
> ...o => rd-0,wr-100,seq,blk-128k,jobs-16.fio} | 10 +++----
> ...0,wr-100,seq,blk-128k,jobs-8,iodepth-8.fio | 20 -------------
> ...fio => rd-0,wr-100,seq,blk-8k,jobs-64.fio} | 14 ++++-----
> ... => rd-100,wr-0,rand,blk-128k,jobs-16.fio} | 11 ++++---
> ...00,wr-0,rand,blk-128k,jobs-8,iodepth-8.fio | 21 --------------
> ...io => rd-100,wr-0,rand,blk-8k,jobs-64.fio} | 13 ++++-----
> ...100,wr-0,rand,blk-8k,jobs-8,iodepth-32.fio | 21 --------------
> ...o => rd-100,wr-0,seq,blk-128k,jobs-16.fio} | 10 +++----
> ...100,wr-0,seq,blk-128k,jobs-4,iodepth-8.fio | 20 -------------
> ...fio => rd-100,wr-0,seq,blk-8k,jobs-64.fio} | 14 ++++-----
> ... => rd-75,wr-25,rand,blk-128k,jobs-16.fio} | 15 ++++++----
> ...5,wr-25,rand,blk-8k,jobs-16,iodepth-32.fio | 27 -----------------
> ...75,wr-25,rand,blk-8k,jobs-4,iodepth-32.fio | 27 -----------------
> ...io => rd-75,wr-25,rand,blk-8k,jobs-64.fio} | 15 ++++++----
> ...75,wr-25,rand,blk-8k,jobs-8,iodepth-32.fio | 27 -----------------
> .../ubuntu_performance_fio.py | 6 ++--
> 20 files changed, 78 insertions(+), 269 deletions(-)
> rename ubuntu_performance_fio/{rd-0,wr-100,rand,blk-8k,jobs-16,iodepth-32.fio => rd-0,wr-100,rand,blk-128k,jobs-16.fio} (59%)
> rename ubuntu_performance_fio/{rd-0,wr-100,rand,blk-8k,jobs-4,iodepth-32.fio => rd-0,wr-100,rand,blk-8k,jobs-64.fio} (57%)
> delete mode 100644 ubuntu_performance_fio/rd-0,wr-100,rand,blk-8k,jobs-8,iodepth-32.fio
> rename ubuntu_performance_fio/{rd-0,wr-100,seq,blk-128k,jobs-16,iodepth-8.fio => rd-0,wr-100,seq,blk-128k,jobs-16.fio} (65%)
> delete mode 100644 ubuntu_performance_fio/rd-0,wr-100,seq,blk-128k,jobs-8,iodepth-8.fio
> rename ubuntu_performance_fio/{rd-0,wr-100,seq,blk-128k,jobs-4,iodepth-8.fio => rd-0,wr-100,seq,blk-8k,jobs-64.fio} (58%)
> rename ubuntu_performance_fio/{rd-100,wr-0,rand,blk-128k,jobs-16,iodepth-8.fio => rd-100,wr-0,rand,blk-128k,jobs-16.fio} (61%)
> delete mode 100644 ubuntu_performance_fio/rd-100,wr-0,rand,blk-128k,jobs-8,iodepth-8.fio
> rename ubuntu_performance_fio/{rd-100,wr-0,rand,blk-8k,jobs-4,iodepth-32.fio => rd-100,wr-0,rand,blk-8k,jobs-64.fio} (57%)
> delete mode 100644 ubuntu_performance_fio/rd-100,wr-0,rand,blk-8k,jobs-8,iodepth-32.fio
> rename ubuntu_performance_fio/{rd-100,wr-0,seq,blk-128k,jobs-16,iodepth-8.fio => rd-100,wr-0,seq,blk-128k,jobs-16.fio} (64%)
> delete mode 100644 ubuntu_performance_fio/rd-100,wr-0,seq,blk-128k,jobs-4,iodepth-8.fio
> rename ubuntu_performance_fio/{rd-100,wr-0,seq,blk-128k,jobs-8,iodepth-8.fio => rd-100,wr-0,seq,blk-8k,jobs-64.fio} (58%)
> rename ubuntu_performance_fio/{rd-100,wr-0,rand,blk-128k,jobs-4,iodepth-8.fio => rd-75,wr-25,rand,blk-128k,jobs-16.fio} (50%)
> delete mode 100644 ubuntu_performance_fio/rd-75,wr-25,rand,blk-8k,jobs-16,iodepth-32.fio
> delete mode 100644 ubuntu_performance_fio/rd-75,wr-25,rand,blk-8k,jobs-4,iodepth-32.fio
> rename ubuntu_performance_fio/{rd-100,wr-0,rand,blk-8k,jobs-16,iodepth-32.fio => rd-75,wr-25,rand,blk-8k,jobs-64.fio} (51%)
> delete mode 100644 ubuntu_performance_fio/rd-75,wr-25,rand,blk-8k,jobs-8,iodepth-32.fio
>
We also use these tests for generic performance testing so changing
these is going to basically mean we reset the stats from scratch and
start again from a new baseline. However, the redeeming feature of the
changes here is that these new changes produce some reliably results and
the previous ones also had a lot of jitter on the test system we use for
the generic tests.
So, yes, I'm OK with these, I just need to re-work the back-end database
that stores the older data and add some new shiny grafana graphs for the
new test cases.
Colin
More information about the kernel-team
mailing list