Improving next_buffer() rpc

Thu Jul 10 04:46:20 UTC 2014

On Thu, Jul 10, 2014 at 3:53 AM, Alberto Aguirre 
<alberto.aguirre at canonical.com> wrote:
> I think RAOF point is accurate, with the current existing approach 
> (triple buffered by default), BufferQueue has 2 buffers for clients, 
> 1 for compositor. rpc next_buffer(SurfaceId) returns (Buffer) will 
> return immediately as BufferQueue will have a free buffer to give out.
> However Daniel's point I believe is the client should also be able to 
> pipeline the rpc wait with their rendering which I agree it should be 
> a goal.
> 
> I think I have a slight practical preference for #2/#6 as we are 
> mostly dealing with EGL glue code now however it doesn't let you 
> pipeline the RPC wait for next buffer but it's not any worse that 
> what we currently have.
> 
> However, long-term I would favor a solution where the "free" signal 
> could just be done by an fd/fence that the client can check as it 
> tries to acquire a free buffer on its side (the client would 
> initially get all buffer references to maintain the pool) if none are 
> free, it would just epoll/wait on fence until free.
> That way the server can signal free buffers asynchronously, which 
> potentially pipelines with client rendering and does not generate an 
> rpc from server to client.

Oooh, yeah. That's very nice.

On mesa, at least, we should be able to get these fences for free, 
thanks to Maarten's dma-buf fence work. It seems that Android has a 
similar mechanism, so this looks gorgeous.

If this is what we want to end up at then any RPC call which exchanges 
a Buffer isn't going to be a part of The Glorious Future™.

I don't think The Glorious Future™ is going to arrive particularly 
soon, so I'm not averse to Kevin implementing a transient buffer 
exchange RPC call.

> --Alberto
> 
> 
> 
> 
> On Wed, Jul 9, 2014 at 11:29 AM, Kevin DuBois 
> <kevin.dubois at canonical.com> wrote:
>> Alberto pointed out a gap in my suggestion #2/#6, which is that the 
>> client wouldn't have a way of getting ownership of the additional 
>> buffers. So maybe that idea (#2/#6) should become:
>> 
>> rpc exchange_buffer(Buffer) returns (Buffer)
>> rpc allocate_additional_buffer(Void) returns (Buffer)
>> rpc remove_additional_buffer(Buffer) returns (Void)
>> 
>> This still would have the submission happen on the exchange call, 
>> but make additional buffer allocations more explicit than the #5 
>> idea.
>> 
>> 
>> On Wed, Jul 9, 2014 at 12:04 PM, Kevin DuBois 
>> <kevin.dubois at canonical.com> wrote:
>>> Attempting to steer the convo more towards the near term (kgunn's 
>>> "First"), I'm just trying to make sure that the protocol change is 
>>> somewhat forward-looking without wading into changing the buffer 
>>> distribution/swapping system too much.
>>> 
>>> Trying to distil the potential directions suggested a bit more:
>>> 
>>> #1 what we currently have:
>>> rpc next_buffer(SurfaceId) returns (Buffer)
>>> 
>>> #2 what I've proposed:
>>> rpc exchange_buffer(Buffer) returns (Buffer)
>>> 
>>> #3 "push buffers", the server pushes out buffers to the clients
>>> 
>>> #4 multiple buffers owned by the client/client allocated buffers
>>> 
>>> now,
>>> #3 and #4 are interesting and potentially beneficial, but outside 
>>> the scope of what I want to accomplish [1] with the optimization 
>>> I'm aiming for. Its good to future-proof the protocol for when/if 
>>> we decide to experiment.
>>> 
>>> #3 strikes me as something I'd want to avoid (especially in the 
>>> near term), because when you're pushing buffers to the client, the 
>>> server is 'driving' the activity of the client processes, which 
>>> seems strange. Now, I'd be fine with the client /making a request/ 
>>> and then getting the buffer back asynchronously, but couldn't think 
>>> of an advantage to this.
>>> 
>>> I think #4 could be accommodated by having idea #5:
>>> #5
>>> rpc next_buffer(SurfaceId) returns (Buffer);
>>> rpc submit_buffer(Buffer) returns (Void);
>>> 
>>> or supplementing idea #2 to make idea #6 (in the future)
>>> rpc exchange_buffer(Buffer) returns (Buffer)
>>> rpc request_client_buffer_count(int) returns (Void)
>>> 
>>> Effectively, #5 removes the implicit release of the client buffer, 
>>> making the submission of the filled buffer explicit. This seems a 
>>> bit less nice to me than #2/#6 from an RAII mindset, so I think I 
>>> still prefer #2 with a future expansion to #6 for multiple client 
>>> buffers.
>>> 
>>> So I guess #2/#6 or  #5 are fine for my immediate purposes, and 
>>> both seem acceptible at the current time, and have some 
>>> forward-thinking in them. Interested if others share my slight 
>>> preference for #2/#6
>>> 
>>> [1]
>>> My aim is to let the android platform's clients delay waiting for 
>>> the gpu to finish with the buffer. It lets the client pass the 
>>> fence and the uncompleted buffer back to the server without 
>>> waiting, and requires that the fences are waited upon in the 
>>> compositon pass. About a year or so ago,  doing something similar 
>>> with internal clients in the first landing of unity, we saw some 
>>> noticeable performance improvements that makes us think this would 
>>> be useful in the USC/unity8 setup we have now.
>>> 
>>> 
>>> On Wed, Jul 9, 2014 at 11:39 AM, Kevin Gunn 
>>> <kevin.gunn at canonical.com> wrote:
>>>> First
>>>> Not sure we're still on topic necessarily wrt changing from id's 
>>>> to fd's
>>>> do we need to conflate that with the double/triple buffering topic 
>>>> ?
>>>> let's answer this first...
>>>> 
>>>> Second
>>>> while we're at it :) triple buffering isn't always a win. In the 
>>>> case of small, frequent renders (as an example "8x8 pixel square 
>>>> follow my finger") you'll have potentially 2 extra buffers that 
>>>> need their 16ms of fame on the screen in the queue, 1 at session 
>>>> server, 1 at system server. Which can look a little laggy. I'm 
>>>> willing to say in the same breath though, that this may be lunatic 
>>>> fringe. The win for the triple buffering case is likely more 
>>>> common, which is spikey render times (14+ms) amongst more normal 
>>>> render times (9-12ms)
>>>> +1 on giving empty buffers back to the clients to allow them to 
>>>> have a "queue" of empty buffers at their disposal (i'm not sure if 
>>>> RAOF is correct or duflu in that its "synchronously waiting for a 
>>>> round trip every swap"...can we already have an empty buffer queue 
>>>> on the client side ?)
>>>> 
>>>> 
>>>> On Wed, Jul 9, 2014 at 4:35 AM, Daniel van Vugt 
>>>> <daniel.van.vugt at canonical.com> wrote:
>>>>> Forgive me for rambling on but I just had an important 
>>>>> realisation...
>>>>> 
>>>>> Our current desire to get back to double buffering is only 
>>>>> because the Mir protocol is synchronously waiting for a round 
>>>>> trip every swap, and somehow I thought that the buffer queue 
>>>>> length affected time spent in the ready_to_composite state. Now 
>>>>> I'm not so sure that's true.
>>>>> 
>>>>> If we changed the protocol to implement parallelism, then in 
>>>>> theory, keeping triple buffering with a fancy zero-latency swap 
>>>>> buffers should perform better than the current protocol that has 
>>>>> to wait for a round trip.
>>>>> 
>>>>> I cannot remember why I thought the length of the buffer queue 
>>>>> affected the time from client-rendering to server-compositing. 
>>>>> Perhaps we really do need to keep triple-buffering always-on so 
>>>>> that the performance gain of a zero-latency client swap-buffers 
>>>>> can be achieved...
>>>>> 
>>>>> In summary, I'm back to thinking any protocol change from 
>>>>> next_buffer() needs to support parallelism and not be so 
>>>>> synchronous.
>>>>> 
>>>>> - Daniel
>>>>> 
>>>>> 
>>>>> 
>>>>> On 09/07/14 16:08, Daniel van Vugt wrote:
>>>>>> Oops. I keep forgetting that the new BufferQueue disallows the
>>>>>> compositor to own less than one buffer, so there would no longer 
>>>>>> be any
>>>>>> benefit to double buffered clients from a more concurrent 
>>>>>> protocol :(
>>>>>> 
>>>>>> Maybe Kevin's suggestion is just fine then. So long as the 
>>>>>> server is
>>>>>> able to figure out the surface(Id) from the Buffer struct.
>>>>>> 
>>>>>> 
>>>>>> On 09/07/14 15:41, Daniel van Vugt wrote:
>>>>>>> Note that we're working on making double-buffering the default 
>>>>>>> again and
>>>>>>> triple the exception. In that case fixing LP: #1253868 may seem
>>>>>>> pointless, but it is surprisingly still relevant. Because a 
>>>>>>> fully
>>>>>>> parallelized design would significantly speed up double 
>>>>>>> buffering too...
>>>>>>> client swap buffers would no longer have to wait for a 
>>>>>>> round-trip before
>>>>>>> returning and would instead be almost instant.
>>>>>>> 
>>>>>>> 
>>>>>>> On 09/07/14 10:00, Daniel van Vugt wrote:
>>>>>>>> Sounds better to just pass buffers around although I'm not 
>>>>>>>> keen on any
>>>>>>>> change that doesn't make progress on the performance 
>>>>>>>> bottleneck LP:
>>>>>>>> #1253868. The bottleneck is the swapping/exchanging approach 
>>>>>>>> which
>>>>>>>> limits the client to holding only one buffer, so I don't think 
>>>>>>>> it's a
>>>>>>>> good idea for new designs to still have that problem.
>>>>>>>> 
>>>>>>>> In order to improve parallelism per LP: #1253868 you'd really 
>>>>>>>> have to
>>>>>>>> receive new buffers as soon as they're free, which means 
>>>>>>>> getting them as
>>>>>>>> MirEvents. Then you only need an RPC function to release them 
>>>>>>>> back to
>>>>>>>> the server:
>>>>>>>> 
>>>>>>>>     rpc release_buffer(Buffer) returns (Void);
>>>>>>>> 
>>>>>>>> Keep in mind the inter-process communication is the bottleneck 
>>>>>>>> here. If
>>>>>>>> you allow a context switch between the server and client then 
>>>>>>>> that's
>>>>>>>> half to one millisecond (see mirping) per RPC round trip. More 
>>>>>>>> than
>>>>>>>> double that for nested servers and you see the protocol delay 
>>>>>>>> could be a
>>>>>>>> significant factor. So I think any protocol enhancement should 
>>>>>>>> have
>>>>>>>> parallelism designed in.
>>>>>>>> 
>>>>>>>> I also think we need to be careful about not landing any 
>>>>>>>> protocol
>>>>>>>> changes to RTM candidate series' 0.4-0.5, so the foundation 
>>>>>>>> for RTM is
>>>>>>>> maximally mature (albeit not yet optimal).
>>>>>>>> 
>>>>>>>> - Daniel
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 08/07/14 21:10, Kevin DuBois wrote:
>>>>>>>>> Hello mir team,
>>>>>>>>> 
>>>>>>>>> In order to get the next buffer for the client, we currently 
>>>>>>>>> have:
>>>>>>>>> 
>>>>>>>>> rpc next_buffer(SurfaceId) returns (Buffer);
>>>>>>>>> 
>>>>>>>>> which is problematic for me in working on [1] because this 
>>>>>>>>> implicitly
>>>>>>>>> releases the buffer from the client side, whereas in working 
>>>>>>>>> on that
>>>>>>>>> performance improvement, I have to send a fd back to the 
>>>>>>>>> server. So I
>>>>>>>>> was thinking of adding an rpc method more like:
>>>>>>>>> 
>>>>>>>>> rpc exchange_buffer(Buffer) returns (Buffer);
>>>>>>>>> 
>>>>>>>>> This would be sufficient to pass the fd fence back, and the 
>>>>>>>>> buffer
>>>>>>>>> id in
>>>>>>>>> the Buffer protocol message would be sufficient for the 
>>>>>>>>> server to
>>>>>>>>> figure
>>>>>>>>> out which surface has sent back its buffer. (given the global 
>>>>>>>>> buffer
>>>>>>>>> id's we're using)
>>>>>>>>> 
>>>>>>>>> This does not address the problem noted in:
>>>>>>>>> https://bugs.launchpad.net/mir/+bug/1253868
>>>>>>>>> but I think that might be better addressed by having an 
>>>>>>>>> exchange type
>>>>>>>>> rpc call (explicit or implicit) and negotiating/increasing 
>>>>>>>>> how many
>>>>>>>>> buffers the client owns somehow else.
>>>>>>>>> 
>>>>>>>>> This seems like something that could have diverse opinions, 
>>>>>>>>> so I'm
>>>>>>>>> hoping to get some input on the protocol change here first.
>>>>>>>>> 
>>>>>>>>> Thanks!
>>>>>>>>> Kevin
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> https://blueprints.launchpad.net/ubuntu/+spec/client-1410-mir-performance
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> item:
>>>>>>>>> "[kdub] fencing improvements for clients add the ipc plumbing"
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Mir-devel mailing list
>>>>> Mir-devel at lists.ubuntu.com
>>>>> Modify settings or unsubscribe at: 
>>>>> https://lists.ubuntu.com/mailman/listinfo/mir-devel
>>>> 
>>>> 
>>>> --
>>>> Mir-devel mailing list
>>>> Mir-devel at lists.ubuntu.com
>>>> Modify settings or unsubscribe at: 
>>>> https://lists.ubuntu.com/mailman/listinfo/mir-devel
>>>> 
>>> 
>> 
>> 
>> --
>> Mir-devel mailing list
>> Mir-devel at lists.ubuntu.com
>> Modify settings or unsubscribe at: 
>> https://lists.ubuntu.com/mailman/listinfo/mir-devel
>> 
>