[Bug 1988240] Re: Performance regression with memcpy on Intel CPU

Heitor Alves de Siqueira 1988240 at bugs.launchpad.net
Wed Aug 31 13:23:20 UTC 2022


Thanks for the report, Shantanu!

Have you confirmed whether this is indeed related to the changes from
bug 1928508? I've looked into upstream changes to
__x86_shared_non_temporal_threshold, and there were no fixes or
regression reports after the ones we've backported to Ubuntu Focal. At
the time this change was introduced, no regressions in other platforms
have been reported upstream or in Ubuntu, so I wonder if we missed your
test case.

Would you be able to double-check whether that patch is responsible?
Have you seen different performance behavior in recent glibc versions,
or other distros with the same glibc version? One could also use
different tunable values for __x86_shared_non_temporal_threshold like
below:

$ GLIBC_TUNABLES=glibc.cpu.x86_non_temporal_threshold=1024*1024*3*4

** Changed in: glibc (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to glibc in Ubuntu.
https://bugs.launchpad.net/bugs/1988240

Title:
  Performance regression with memcpy on Intel CPU

Status in glibc package in Ubuntu:
  Incomplete

Bug description:
  # lsb_release -rd
  Description:	Ubuntu 20.04.4 LTS
  Release:	20.04

  Reporting a performance regression in libc6-dev==2.31-0ubuntu9.9 when
  upgrading from 9.7.

  Regression was observed on Intel Xeon(R) Gold 6248 CPU @ 2.50GHz
  (Cascade Lake)

  We're seeing a 3x slowdown on e.g. the following tiny program and similar slowdowns on important workloads:
  ```
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <time.h>

  int main(void) {
      size_t SIZE = (1 << 20);
      char *src = malloc(SIZE);
      char *dst = malloc(SIZE);

      for(int i = 0; i < (SIZE); ++i) {
          src[i] = rand() % 256;
          dst[i] = rand() % 256;
      }
      clock_t start = clock();
      for(int i = 0; i < 10000; ++i) {
          memcpy(dst, src, SIZE);
      }
      clock_t end = clock();
      printf("%f\n", (double) (end - start)/CLOCKS_PER_SEC);
  }
  ```

  Probably due to changes resulting from
  https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1928508

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1988240/+subscriptions




More information about the foundations-bugs mailing list