[Bug 1638695] Re: Python 2.7.12 performance regression

Fri Jan 13 23:41:11 UTC 2017

Hello,

I have been working to track down the origin of the performance penalty
exposed by this bug.

All the tests that I am performing are made on top of a locally compiled version of python 2.7.12 (from upstream sources, not applying any ubuntu patch on it)
built with different versions of GCC, 5.3.1 (current) and 4.8.0 both coming from the Ubuntu archives.

I can see important performance differences as I mentioned on my previous comments (check the full comparisons stats) just by
switching the GCC version. I decided to focus my investigation on the pickle module, since it seems to be the most affected one being
approximately 1.17x slower between the different gcc versions.

Due to the amount of changes introduced between 4.8.0 and 5.3.1 I decided to not persue the approach
of doing a bisection of the changes for identifying an offending commit yet, until we can identify which optimization or change
at compile time is causing the regression and focus our investigation on that specific area.

My understanding is that the performance penalty caused by the compiler might be related
to 2 factors, a important change on the linked libc or a optimization made by the compiler in the resulting object. 

Since the resulting objects are linked against the same glibc version 2.23, I will not consider that factor as part of the analysis,
instead I will focus on analyzing the performance of the resulting objects generated by the compiler.

For following this approach I ran the pyperformance suite and used a valgrind session excluding all the modules with the exception of the pickle module, 
using the default supressions to avoid missing any reference in the python runtime with the following arguments:

valgrind --tool=callgrind --instr-atstart=no --trace-children=yes
venv/cpython2.7-6ed9b6df9cd4/bin/python -m performance run --python
/usr/local/bin/python2.7 -b pickle --inside-venv

I did run this process multiple times with both GCC 4.8.0 and 5.3.1  to produce a large set of callgrind files to analyze , those callgrind files contains the full tree of execution 
including all the relocations, jumps, calls to the libc and the python runtime itself and of course time spent per function and the amount of calls made to it.

I cleaned out all the resulting callgrind files removing the files smaller than 100k and the ones that were not loading the cPickle
extension (https://pastebin.canonical.com/175951/). 

Over that set of files I executed callgrind_annotate to generate the stats per function ordered by the exclusive cost of function, 
Then with this script (http://paste.ubuntu.com/23795048/
) I added all the costs per function per GCC version (4.8 and 5.3.1) and then I calculated the variance in cost between them.

The resulting file contains a tuple with the following format:

function name - gcc 4.8 cost - gcc 5.3.1 cost - variance in percent

As an example:

/home/ubuntu/python/cpython/Objects/tupleobject.c:tupleiter_dealloc 258068.000000 445009.000000 (variance: 0.724387)
/home/ubuntu/python/cpython/Objects/object.c:try_3way_compare 984860.000000 1676351.000000 (variance: 0.702121)
/home/ubuntu/python/cpython/Python/marshal.c:r_object 183524.000000 27742.000000 (variance: -0.848837)

The full results can be located here sorted by variance in descending
order http://paste.ubuntu.com/23795023/

Now that we have these results we can move forward comparing the generated code for the functions with bigger variance 
and track which optimization done by GCC might be altering the resulting objects.

I will update this case after further investigation.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to python2.7 in Ubuntu.
https://bugs.launchpad.net/bugs/1638695

Title:
  Python 2.7.12 performance regression

Status in python2.7 package in Ubuntu:
  Confirmed

Bug description:
  I work on the OpenStack-Ansible project and we've noticed that testing
  jobs on 16.04 take quite a bit longer to complete than on 14.04.  They
  complete within an hour on 14.04 but they normally take 90 minutes or
  more on 16.04.  We use the same version of Ansible with both versions
  of Ubuntu.

  After more digging, I tested python performance (using the
  'performance' module) on 14.04 (2.7.6) and on 16.04 (2.7.12).  There
  is a significant performance difference between each version of
  python.  That is detailed in a spreadsheet[0].

  I began using perf to dig into the differences when running the python
  performance module and when using Ansible playbooks.  CPU migrations
  (as measured by perf) are doubled in Ubuntu 16.04 when running the
  same python workloads.

  I tried changing some of the kerne.sched sysctl configurables but they
  had very little effect on the results.

  I compiled python 2.7.12 from source on 14.04 and found the
  performance to be unchanged there.  I'm not entirely sure where the
  problem might be now.

  We also have a bug open in OpenStack-Ansible[1] that provides
  additional detail. Thanks in advance for any help you can provide!

  [0] https://docs.google.com/spreadsheets/d/18MmptS_DAd1YP3OhHWQqLYVA9spC3xLt4PS3STI6tds/edit?usp=sharing
  [1] https://bugs.launchpad.net/openstack-ansible/+bug/1637494

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1638695/+subscriptions