[RFC] Performance test framework

Mon Jun 22 02:04:30 UTC 2015

+1
Go nuts.

In the past the only point of contention was that people (myself 
included) made mistakes in measurement. So we need to be aware that 
non-trivial tests can have just as many bugs as the system being tested, 
and keep an eye out for them. It would be nice to build some automation 
that everyone trusts...

We also need to back up theoretical and numerical improvements with 
visual observation. Last year some optimisations were proposed that 
reported "lower latency" in their test automation but actually provided 
higher visual latency if you watched closely with human eyes.

One related example of a non-obvious side-effect you need to be aware of 
is input resampling vs no input resampling. If you turn resampling off 
then latency at that input stage is "eliminated", but as a result then 
you flood the buffer queues if it's faster than the display refresh rate 
and so get 32ms (and it increases to more over time) of extra visual 
latency. So turning off input resampling is a net increase in visual 
latency, for example...

On 19/06/15 18:22, Alexandros Frantzis wrote:
> Hello all,
>
> there have recently been a few occasions where we wanted to experiment with
> performance improvements, but lacked a way to easily measure their effect.
> There have also been a few occasions where different implementations of the
> (supposedly) same performance test came up with different results. In light of
> these issues, it would be helpful to have a common performance test framework,
> to make it easy to create and share performance tests scenarios.
>
> As our codebase is C++ it would be natural to provide such a framework
> in C++, providing the maximum flexibility and customizability.
>
> On the other hand, sharing C++ code is not convenient, especially if want more
> casual users to be able to try out the scenarios. As an alternative, we could
> provide this test framework as a python library, possibly backed by a glue C++
> library as needed (note that for some scenarios a glue layer is not needed,
> since we could use our demo servers and clients, or write new ones).
>
> Here are some pros and cons of the solutions (pros of one are cons of the other
> in this case):
>
> C++:
>   + Easy integration with code base
>   + Customizability of server/client behavior
>
> python (possibly backed by C++ glue layer):
>   + Easy to write
>   + Convenience in sharing test scenarios
>   + Plethora of existing libraries we need (e.g. python libs for babeltrace,
>     uinput, statistics)
>
> This how I imagine a performance test script could look like in Python (the C++
> version wouldn't be very different in terms of abstractions):
>
>      import time
>      from mir import PerformanceTest, Server, Client, Input
>
>      host = Server(reports=["input","compositor"])
>      nested = Server(host=host)
>      client = Client(server=nested)
>
>      test = PerformanceTest([host,nested,client])
>      test.start()
>
>      # Send an input event every 50ms
>      input = Input()
>      for i in range(100):
>          input.inject(...)
>          time.sleep(0.050)
>
>      test.stop()
>
>      trace = test.babeltrace()
>      ... process trace and print results ...
>
> [Note: This is example is for illustration only. The specifics of the
> performance framework API are not the main point of this post. We will have a
> chance to discuss more about them in the future, after the high level decisions
> have been made.]
>
> The main drawback with the script based approach (vs a pure C++ approach) is
> that it's not clear how to provide custom behavior for servers and, more
> importantly, clients.
>
> If we find that our performance tests need only a small collection of behaviors
> we could make them available as configuration options:
>
>      Client(behavior="swap_on_input")
>
> Alternatively, we could provide more fine grained customization points that run
> python code (which will of course increase the complexity of the glue layer):
>
>      def on_input_received(...): do stuff
>      Client(on_input_received=on_input_received)
>
> So, what I would like to hear from you is:
>
> 1. Your general preference for python vs C++ for the test scripts
> 2. Any particular performance tests that you would like to see implemented,
>     so we can get a first idea of what kinds of custom behaviors we may need
> 3. Any other comments, of course!
>
> Thank you,
> Alexandros
>