Artificial performance limitation

Fri Nov 14 08:06:20 UTC 2014

No visible benefit because I'm testing on my Intel desktop, and even 
after offloading the responses to the IPC pool the compositor thread is 
still crippled by the Intel deferred batching problem. I have a fix for 
that too; just trying to make it less ugly.

So I know how to fix the desktop performance now for sure. As for the 
netbook, I think that requires both fixes at least. Maybe more...

On 14/11/14 09:51, Daniel van Vugt wrote:
> Yes agreed; that's actually what I was saying in the previous email and
> I prototyped it yesterday.
>
> No visible benefit yet so I'm trying to find out why still...
>
> P.S. Surprisingly io_service is over-simplified and does not lend itself
> to dynamic thread pooling. Because there's no nice single entry point to
> hook for job submission, which you need to guarantee enough threads are
> allocated. Best case is you observe jobs are contending after the fact
> and react in a delayed fashion :(
>
>
> On 13/11/14 22:48, Alan Griffiths wrote:
>> I've not looked through your evidence, but you seem to be overlooking an
>> option:
>>
>> Push the response logic to the BasicConnector::io_service - that uses
>> epoll() to farm work out to the IPC thread pool (which, despite its
>> default size of 1, still exists).
>>
>> Yes, there's a bit of wiring things together needed to get appropriate
>> access in SessionMediator::exchange_buffer(), but not much - its all
>> frontend.
>>
>> On 13/11/14 04:19, Daniel van Vugt wrote:
>>> I think the cleanest solution would be a return to using the "IPC
>>> thread pool" (like in the old days). Presently we use it for receiving
>>> requests, but not for sending responses any more (unless frame
>>> dropping).
>>>
>>> The challenge is only to ensure we don't reintroduce the problems we
>>> used to have with it being fixed in size. Some intelligent scaling is
>>> required.
>>>
>>> Although longer term, a thread pool might not be required at all if we
>>> can come up with a reliably fast asynchronous select() implementation
>>> to replace the asio sockets/io_service.
>>>
>>>
>>> On 13/11/14 11:58, Daniel van Vugt wrote:
>>>> Pretty pictures of a stuttering server attached.
>>>>
>>>> Notice the CompositingFunctor where ~vector destruction is consuming
>>>> 30%
>>>> of the time, due to ~TemporaryCompositorBuffer taking 27% of the time.
>>>>
>>>> I think we need to move the final buffer release/response logic out of
>>>> the compositor thread one way or another, because blocking the
>>>> compositor thread is a very visible problem.
>>>>
>>>>
>>>> On 03/11/14 08:36, Daniel van Vugt wrote:
>>>>> This bug and its sister bug are driving me crazy:
>>>>> https://bugs.launchpad.net/mir/+bug/1388490
>>>>> https://bugs.launchpad.net/mir/+bug/1377872
>>>>>
>>>>> I can see in both cases that the frame rate ceiling is arbitrary and
>>>>> temporary. It has nothing at all to do with the power of the host. And
>>>>> the workaround that fixes the problem in both cases is a bit crazy.
>>>>>
>>>>> I've so far investigated theories around:
>>>>>     * buffer lifetimes
>>>>>     * Mesa/intel GL batching bugs
>>>>>     * lock contention (locks held too long)
>>>>>     * power management modes
>>>>>     * Mir socket/protocol bottlenecks
>>>>>
>>>>> All the theories have some merit but I am not making progress. If you
>>>>> are (un)lucky enough to be able to reproduce either bug then please
>>>>> join
>>>>> in. More data points are required.
>>>>>
>>>>> - Daniel
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>