Improving next_buffer() rpc

Wed Jul 9 17:53:41 UTC 2014

I think RAOF point is accurate, with the current existing approach (triple
buffered by default), BufferQueue has 2 buffers for clients, 1 for
compositor. rpc next_buffer(SurfaceId) returns (Buffer) will return
immediately as BufferQueue will have a free buffer to give out.
However Daniel's point I believe is the client should also be able to
pipeline the rpc wait with their rendering which I agree it should be a
goal.

I think I have a slight practical preference for #2/#6 as we are mostly
dealing with EGL glue code now however it doesn't let you pipeline the RPC
wait for next buffer but it's not any worse that what we currently have.

However, long-term I would favor a solution where the "free" signal could
just be done by an fd/fence that the client can check as it tries to
acquire a free buffer on its side (the client would initially get all
buffer references to maintain the pool) if none are free, it would just
epoll/wait on fence until free.
That way the server can signal free buffers asynchronously, which
potentially pipelines with client rendering and does not generate an rpc
from server to client.

--Alberto

On Wed, Jul 9, 2014 at 11:29 AM, Kevin DuBois <kevin.dubois at canonical.com>
wrote:

> Alberto pointed out a gap in my suggestion #2/#6, which is that the client
> wouldn't have a way of getting ownership of the additional buffers. So
> maybe that idea (#2/#6) should become:
>
> rpc exchange_buffer(Buffer) returns (Buffer)
> rpc allocate_additional_buffer(Void) returns (Buffer)
> rpc remove_additional_buffer(Buffer) returns (Void)
>
> This still would have the submission happen on the exchange call, but make
> additional buffer allocations more explicit than the #5 idea.
>
>
> On Wed, Jul 9, 2014 at 12:04 PM, Kevin DuBois <kevin.dubois at canonical.com>
> wrote:
>
>> Attempting to steer the convo more towards the near term (kgunn's
>> "First"), I'm just trying to make sure that the protocol change is somewhat
>> forward-looking without wading into changing the buffer
>> distribution/swapping system too much.
>>
>> Trying to distil the potential directions suggested a bit more:
>>
>> #1 what we currently have:
>> rpc next_buffer(SurfaceId) returns (Buffer)
>>
>> #2 what I've proposed:
>> rpc exchange_buffer(Buffer) returns (Buffer)
>>
>> #3 "push buffers", the server pushes out buffers to the clients
>>
>> #4 multiple buffers owned by the client/client allocated buffers
>>
>> now,
>> #3 and #4 are interesting and potentially beneficial, but outside the
>> scope of what I want to accomplish [1] with the optimization I'm aiming
>> for. Its good to future-proof the protocol for when/if we decide to
>> experiment.
>>
>> #3 strikes me as something I'd want to avoid (especially in the near
>> term), because when you're pushing buffers to the client, the server is
>> 'driving' the activity of the client processes, which seems strange. Now,
>> I'd be fine with the client /making a request/ and then getting the buffer
>> back asynchronously, but couldn't think of an advantage to this.
>>
>> I think #4 could be accommodated by having idea #5:
>> #5
>> rpc next_buffer(SurfaceId) returns (Buffer);
>> rpc submit_buffer(Buffer) returns (Void);
>>
>> or supplementing idea #2 to make idea #6 (in the future)
>> rpc exchange_buffer(Buffer) returns (Buffer)
>> rpc request_client_buffer_count(int) returns (Void)
>>
>> Effectively, #5 removes the implicit release of the client buffer, making
>> the submission of the filled buffer explicit. This seems a bit less nice to
>> me than #2/#6 from an RAII mindset, so I think I still prefer #2 with a
>> future expansion to #6 for multiple client buffers.
>>
>> So I guess #2/#6 or  #5 are fine for my immediate purposes, and both seem
>> acceptible at the current time, and have some forward-thinking in them.
>> Interested if others share my slight preference for #2/#6
>>
>> [1]
>> My aim is to let the android platform's clients delay waiting for the gpu
>> to finish with the buffer. It lets the client pass the fence and the
>> uncompleted buffer back to the server without waiting, and requires that
>> the fences are waited upon in the compositon pass. About a year or so ago,
>> doing something similar with internal clients in the first landing of
>> unity, we saw some noticeable performance improvements that makes us think
>> this would be useful in the USC/unity8 setup we have now.
>>
>>
>> On Wed, Jul 9, 2014 at 11:39 AM, Kevin Gunn <kevin.gunn at canonical.com>
>> wrote:
>>
>>> First
>>> Not sure we're still on topic necessarily wrt changing from id's to fd's
>>> do we need to conflate that with the double/triple buffering topic ?
>>> let's answer this first...
>>>
>>> Second
>>> while we're at it :) triple buffering isn't always a win. In the case of
>>> small, frequent renders (as an example "8x8 pixel square follow my finger")
>>> you'll have potentially 2 extra buffers that need their 16ms of fame on the
>>> screen in the queue, 1 at session server, 1 at system server. Which can
>>> look a little laggy. I'm willing to say in the same breath though, that
>>> this may be lunatic fringe. The win for the triple buffering case is likely
>>> more common, which is spikey render times (14+ms) amongst more normal
>>> render times (9-12ms)
>>> +1 on giving empty buffers back to the clients to allow them to have a
>>> "queue" of empty buffers at their disposal (i'm not sure if RAOF is correct
>>> or duflu in that its "synchronously waiting for a round trip every swap"...can
>>> we already have an empty buffer queue on the client side ?)
>>>
>>>
>>> On Wed, Jul 9, 2014 at 4:35 AM, Daniel van Vugt <
>>> daniel.van.vugt at canonical.com> wrote:
>>>
>>>> Forgive me for rambling on but I just had an important realisation...
>>>>
>>>> Our current desire to get back to double buffering is only because the
>>>> Mir protocol is synchronously waiting for a round trip every swap, and
>>>> somehow I thought that the buffer queue length affected time spent in the
>>>> ready_to_composite state. Now I'm not so sure that's true.
>>>>
>>>> If we changed the protocol to implement parallelism, then in theory,
>>>> keeping triple buffering with a fancy zero-latency swap buffers should
>>>> perform better than the current protocol that has to wait for a round trip.
>>>>
>>>> I cannot remember why I thought the length of the buffer queue affected
>>>> the time from client-rendering to server-compositing. Perhaps we really do
>>>> need to keep triple-buffering always-on so that the performance gain of a
>>>> zero-latency client swap-buffers can be achieved...
>>>>
>>>> In summary, I'm back to thinking any protocol change from next_buffer()
>>>> needs to support parallelism and not be so synchronous.
>>>>
>>>> - Daniel
>>>>
>>>>
>>>>
>>>> On 09/07/14 16:08, Daniel van Vugt wrote:
>>>>
>>>>> Oops. I keep forgetting that the new BufferQueue disallows the
>>>>> compositor to own less than one buffer, so there would no longer be any
>>>>> benefit to double buffered clients from a more concurrent protocol :(
>>>>>
>>>>> Maybe Kevin's suggestion is just fine then. So long as the server is
>>>>> able to figure out the surface(Id) from the Buffer struct.
>>>>>
>>>>>
>>>>> On 09/07/14 15:41, Daniel van Vugt wrote:
>>>>>
>>>>>> Note that we're working on making double-buffering the default again
>>>>>> and
>>>>>> triple the exception. In that case fixing LP: #1253868 may seem
>>>>>> pointless, but it is surprisingly still relevant. Because a fully
>>>>>> parallelized design would significantly speed up double buffering
>>>>>> too...
>>>>>> client swap buffers would no longer have to wait for a round-trip
>>>>>> before
>>>>>> returning and would instead be almost instant.
>>>>>>
>>>>>>
>>>>>> On 09/07/14 10:00, Daniel van Vugt wrote:
>>>>>>
>>>>>>> Sounds better to just pass buffers around although I'm not keen on
>>>>>>> any
>>>>>>> change that doesn't make progress on the performance bottleneck LP:
>>>>>>> #1253868. The bottleneck is the swapping/exchanging approach which
>>>>>>> limits the client to holding only one buffer, so I don't think it's a
>>>>>>> good idea for new designs to still have that problem.
>>>>>>>
>>>>>>> In order to improve parallelism per LP: #1253868 you'd really have to
>>>>>>> receive new buffers as soon as they're free, which means getting
>>>>>>> them as
>>>>>>> MirEvents. Then you only need an RPC function to release them back to
>>>>>>> the server:
>>>>>>>
>>>>>>>     rpc release_buffer(Buffer) returns (Void);
>>>>>>>
>>>>>>> Keep in mind the inter-process communication is the bottleneck here.
>>>>>>> If
>>>>>>> you allow a context switch between the server and client then that's
>>>>>>> half to one millisecond (see mirping) per RPC round trip. More than
>>>>>>> double that for nested servers and you see the protocol delay could
>>>>>>> be a
>>>>>>> significant factor. So I think any protocol enhancement should have
>>>>>>> parallelism designed in.
>>>>>>>
>>>>>>> I also think we need to be careful about not landing any protocol
>>>>>>> changes to RTM candidate series' 0.4-0.5, so the foundation for RTM
>>>>>>> is
>>>>>>> maximally mature (albeit not yet optimal).
>>>>>>>
>>>>>>> - Daniel
>>>>>>>
>>>>>>>
>>>>>>> On 08/07/14 21:10, Kevin DuBois wrote:
>>>>>>>
>>>>>>>> Hello mir team,
>>>>>>>>
>>>>>>>> In order to get the next buffer for the client, we currently have:
>>>>>>>>
>>>>>>>> rpc next_buffer(SurfaceId) returns (Buffer);
>>>>>>>>
>>>>>>>> which is problematic for me in working on [1] because this
>>>>>>>> implicitly
>>>>>>>> releases the buffer from the client side, whereas in working on that
>>>>>>>> performance improvement, I have to send a fd back to the server. So
>>>>>>>> I
>>>>>>>> was thinking of adding an rpc method more like:
>>>>>>>>
>>>>>>>> rpc exchange_buffer(Buffer) returns (Buffer);
>>>>>>>>
>>>>>>>> This would be sufficient to pass the fd fence back, and the buffer
>>>>>>>> id in
>>>>>>>> the Buffer protocol message would be sufficient for the server to
>>>>>>>> figure
>>>>>>>> out which surface has sent back its buffer. (given the global buffer
>>>>>>>> id's we're using)
>>>>>>>>
>>>>>>>> This does not address the problem noted in:
>>>>>>>> https://bugs.launchpad.net/mir/+bug/1253868
>>>>>>>> but I think that might be better addressed by having an exchange
>>>>>>>> type
>>>>>>>> rpc call (explicit or implicit) and negotiating/increasing how many
>>>>>>>> buffers the client owns somehow else.
>>>>>>>>
>>>>>>>> This seems like something that could have diverse opinions, so I'm
>>>>>>>> hoping to get some input on the protocol change here first.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://blueprints.launchpad.net/ubuntu/+spec/client-1410-
>>>>>>>> mir-performance
>>>>>>>>
>>>>>>>>
>>>>>>>> item:
>>>>>>>> "[kdub] fencing improvements for clients add the ipc plumbing"
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> Mir-devel mailing list
>>>> Mir-devel at lists.ubuntu.com
>>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/
>>>> mailman/listinfo/mir-devel
>>>>
>>>
>>>
>>> --
>>> Mir-devel mailing list
>>> Mir-devel at lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/mir-devel
>>>
>>>
>>
>
> --
> Mir-devel mailing list
> Mir-devel at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/mir-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/mir-devel/attachments/20140709/fef770ce/attachment-0001.html>