ACK/Cmt: [PATCH 0/2][autotest-client-tests] ubuntu_performance_fio: stabilize DGX2 FIO tests

Colin Ian King colin.king at canonical.com
Fri Dec 4 16:19:54 UTC 2020




On 04/12/2020 15:59, Ian May wrote:
> DGX2 Performance testing is currently producing unstable results with FIO tests. Through trial
> and error and suggestions from Nvidia's DGX performance team. We found a set of tests
> that help stabalize the FIO numbers. Since the goal of the perf tests is for identifying
> regressions, I propose we make the necessary adjustments for future DGX2 FIO perf tests.
> 
> Since the FIO tests have their test description in the file name this involves adding a new
> file for each test.  The new tests will use only two combinations of blk size and jobs
>   blk-128k, jobs-16
>   blk-8k, jobs-64
> The ioengine has been changed from libaio to sync, this simplified things in regards to not
> having to isolate a stable iodepth and also mirrors Nvidia's tests.  Another signicant
> change is increasing the file size throughput for each test.  Currently we are using 2G for
> each test.  By changing the file size to 32G and 8G with the respective blk size and jobs
> mentioned above, we see all tests complete within a 5% margin of error we were targeting.
> 
> The second patch changes the file size handling for the tests.  Currently we use a globally
> defined file size and swap that into each test as we run.  Since this new model uses multiple
> file sizes, the size is defined in the test itself and we pull the value out of the test for 
> display output purposes.  There is currently a check against the file size before we run any
> of the FIO tests.  For simplicity and ease of changes, I propose setting the global file size
> to the size of the largest test size(32G), so the initial check against available system memory will
> still be a valid check.  This seemed safer than trying to move the check to in between tests
> and therefore having to introduce additional error handling.
> 
> Ian May (2):
>   UBUNTU: SAUCE: ubuntu_performance_fio: Add new FIO tests and remove
>     old
>   UBUNTU: SAUCE: ubuntu_performance_fio: Change value of FIO global
>     file_size_mb
> 
>  ubuntu_performance_fio/control                | 29 +++++++------------
>  ... => rd-0,wr-100,rand,blk-128k,jobs-16.fio} | 13 ++++-----
>  ...io => rd-0,wr-100,rand,blk-8k,jobs-64.fio} | 13 ++++-----
>  ...0,wr-100,rand,blk-8k,jobs-8,iodepth-32.fio | 21 --------------
>  ...o => rd-0,wr-100,seq,blk-128k,jobs-16.fio} | 10 +++----
>  ...0,wr-100,seq,blk-128k,jobs-8,iodepth-8.fio | 20 -------------
>  ...fio => rd-0,wr-100,seq,blk-8k,jobs-64.fio} | 14 ++++-----
>  ... => rd-100,wr-0,rand,blk-128k,jobs-16.fio} | 11 ++++---
>  ...00,wr-0,rand,blk-128k,jobs-8,iodepth-8.fio | 21 --------------
>  ...io => rd-100,wr-0,rand,blk-8k,jobs-64.fio} | 13 ++++-----
>  ...100,wr-0,rand,blk-8k,jobs-8,iodepth-32.fio | 21 --------------
>  ...o => rd-100,wr-0,seq,blk-128k,jobs-16.fio} | 10 +++----
>  ...100,wr-0,seq,blk-128k,jobs-4,iodepth-8.fio | 20 -------------
>  ...fio => rd-100,wr-0,seq,blk-8k,jobs-64.fio} | 14 ++++-----
>  ... => rd-75,wr-25,rand,blk-128k,jobs-16.fio} | 15 ++++++----
>  ...5,wr-25,rand,blk-8k,jobs-16,iodepth-32.fio | 27 -----------------
>  ...75,wr-25,rand,blk-8k,jobs-4,iodepth-32.fio | 27 -----------------
>  ...io => rd-75,wr-25,rand,blk-8k,jobs-64.fio} | 15 ++++++----
>  ...75,wr-25,rand,blk-8k,jobs-8,iodepth-32.fio | 27 -----------------
>  .../ubuntu_performance_fio.py                 |  6 ++--
>  20 files changed, 78 insertions(+), 269 deletions(-)
>  rename ubuntu_performance_fio/{rd-0,wr-100,rand,blk-8k,jobs-16,iodepth-32.fio => rd-0,wr-100,rand,blk-128k,jobs-16.fio} (59%)
>  rename ubuntu_performance_fio/{rd-0,wr-100,rand,blk-8k,jobs-4,iodepth-32.fio => rd-0,wr-100,rand,blk-8k,jobs-64.fio} (57%)
>  delete mode 100644 ubuntu_performance_fio/rd-0,wr-100,rand,blk-8k,jobs-8,iodepth-32.fio
>  rename ubuntu_performance_fio/{rd-0,wr-100,seq,blk-128k,jobs-16,iodepth-8.fio => rd-0,wr-100,seq,blk-128k,jobs-16.fio} (65%)
>  delete mode 100644 ubuntu_performance_fio/rd-0,wr-100,seq,blk-128k,jobs-8,iodepth-8.fio
>  rename ubuntu_performance_fio/{rd-0,wr-100,seq,blk-128k,jobs-4,iodepth-8.fio => rd-0,wr-100,seq,blk-8k,jobs-64.fio} (58%)
>  rename ubuntu_performance_fio/{rd-100,wr-0,rand,blk-128k,jobs-16,iodepth-8.fio => rd-100,wr-0,rand,blk-128k,jobs-16.fio} (61%)
>  delete mode 100644 ubuntu_performance_fio/rd-100,wr-0,rand,blk-128k,jobs-8,iodepth-8.fio
>  rename ubuntu_performance_fio/{rd-100,wr-0,rand,blk-8k,jobs-4,iodepth-32.fio => rd-100,wr-0,rand,blk-8k,jobs-64.fio} (57%)
>  delete mode 100644 ubuntu_performance_fio/rd-100,wr-0,rand,blk-8k,jobs-8,iodepth-32.fio
>  rename ubuntu_performance_fio/{rd-100,wr-0,seq,blk-128k,jobs-16,iodepth-8.fio => rd-100,wr-0,seq,blk-128k,jobs-16.fio} (64%)
>  delete mode 100644 ubuntu_performance_fio/rd-100,wr-0,seq,blk-128k,jobs-4,iodepth-8.fio
>  rename ubuntu_performance_fio/{rd-100,wr-0,seq,blk-128k,jobs-8,iodepth-8.fio => rd-100,wr-0,seq,blk-8k,jobs-64.fio} (58%)
>  rename ubuntu_performance_fio/{rd-100,wr-0,rand,blk-128k,jobs-4,iodepth-8.fio => rd-75,wr-25,rand,blk-128k,jobs-16.fio} (50%)
>  delete mode 100644 ubuntu_performance_fio/rd-75,wr-25,rand,blk-8k,jobs-16,iodepth-32.fio
>  delete mode 100644 ubuntu_performance_fio/rd-75,wr-25,rand,blk-8k,jobs-4,iodepth-32.fio
>  rename ubuntu_performance_fio/{rd-100,wr-0,rand,blk-8k,jobs-16,iodepth-32.fio => rd-75,wr-25,rand,blk-8k,jobs-64.fio} (51%)
>  delete mode 100644 ubuntu_performance_fio/rd-75,wr-25,rand,blk-8k,jobs-8,iodepth-32.fio
> 

We also use these tests for generic performance testing so changing
these is going to basically mean we reset the stats from scratch and
start again from a new baseline.  However, the redeeming feature of the
changes here is that these new changes produce some reliably results and
the previous ones also had a lot of jitter on the test system we use for
the generic tests.

So, yes, I'm OK with these, I just need to re-work the back-end database
that stores the older data and add some new shiny grafana graphs for the
new test cases.

Colin



More information about the kernel-team mailing list