Improving next_buffer() rpc

Thu Jul 10 02:23:00 UTC 2014

Critically I think we need to avoid any RPC call which is an exchange or 
swap in a single call. Because that's the problem, and it prevents the 
client from ever owning more than one buffer. So performance bug 1253868 
would remain unsolved:
     https://bugs.launchpad.net/mir/+bug/1253868

Buffers need to arrive at the client asynchronously, probably inside a 
MirEvent. And then you only need a single message to send them back:
     rpc release_buffer(Buffer) returns (Void);

- Daniel


On 10/07/14 00:29, Kevin DuBois wrote:
> Alberto pointed out a gap in my suggestion #2/#6, which is that the
> client wouldn't have a way of getting ownership of the additional
> buffers. So maybe that idea (#2/#6) should become:
>
> rpc exchange_buffer(Buffer) returns (Buffer)
> rpc allocate_additional_buffer(Void) returns (Buffer)
> rpc remove_additional_buffer(Buffer) returns (Void)
>
> This still would have the submission happen on the exchange call, but
> make additional buffer allocations more explicit than the #5 idea.
>
>
> On Wed, Jul 9, 2014 at 12:04 PM, Kevin DuBois
> <kevin.dubois at canonical.com <mailto:kevin.dubois at canonical.com>> wrote:
>
>     Attempting to steer the convo more towards the near term (kgunn's
>     "First"), I'm just trying to make sure that the protocol change is
>     somewhat forward-looking without wading into changing the buffer
>     distribution/swapping system too much.
>
>     Trying to distil the potential directions suggested a bit more:
>
>     #1 what we currently have:
>     rpc next_buffer(SurfaceId) returns (Buffer)
>
>     #2 what I've proposed:
>     rpc exchange_buffer(Buffer) returns (Buffer)
>
>     #3 "push buffers", the server pushes out buffers to the clients
>
>     #4 multiple buffers owned by the client/client allocated buffers
>
>     now,
>     #3 and #4 are interesting and potentially beneficial, but outside
>     the scope of what I want to accomplish [1] with the optimization I'm
>     aiming for. Its good to future-proof the protocol for when/if we
>     decide to experiment.
>
>     #3 strikes me as something I'd want to avoid (especially in the near
>     term), because when you're pushing buffers to the client, the server
>     is 'driving' the activity of the client processes, which seems
>     strange. Now, I'd be fine with the client /making a request/ and
>     then getting the buffer back asynchronously, but couldn't think of
>     an advantage to this.
>
>     I think #4 could be accommodated by having idea #5:
>     #5
>     rpc next_buffer(SurfaceId) returns (Buffer);
>     rpc submit_buffer(Buffer) returns (Void);
>
>     or supplementing idea #2 to make idea #6 (in the future)
>     rpc exchange_buffer(Buffer) returns (Buffer)
>     rpc request_client_buffer_count(int) returns (Void)
>
>     Effectively, #5 removes the implicit release of the client buffer,
>     making the submission of the filled buffer explicit. This seems a
>     bit less nice to me than #2/#6 from an RAII mindset, so I think I
>     still prefer #2 with a future expansion to #6 for multiple client
>     buffers.
>
>     So I guess #2/#6 or  #5 are fine for my immediate purposes, and both
>     seem acceptible at the current time, and have some forward-thinking
>     in them. Interested if others share my slight preference for #2/#6
>
>     [1]
>     My aim is to let the android platform's clients delay waiting for
>     the gpu to finish with the buffer. It lets the client pass the fence
>     and the uncompleted buffer back to the server without waiting, and
>     requires that the fences are waited upon in the compositon pass.
>     About a year or so ago,  doing something similar with internal
>     clients in the first landing of unity, we saw some noticeable
>     performance improvements that makes us think this would be useful in
>     the USC/unity8 setup we have now.
>
>
>     On Wed, Jul 9, 2014 at 11:39 AM, Kevin Gunn
>     <kevin.gunn at canonical.com <mailto:kevin.gunn at canonical.com>> wrote:
>
>         First
>         Not sure we're still on topic necessarily wrt changing from id's
>         to fd's
>         do we need to conflate that with the double/triple buffering topic ?
>         let's answer this first...
>
>         Second
>         while we're at it :) triple buffering isn't always a win. In the
>         case of small, frequent renders (as an example "8x8 pixel square
>         follow my finger") you'll have potentially 2 extra buffers that
>         need their 16ms of fame on the screen in the queue, 1 at session
>         server, 1 at system server. Which can look a little laggy. I'm
>         willing to say in the same breath though, that this may be
>         lunatic fringe. The win for the triple buffering case is likely
>         more common, which is spikey render times (14+ms) amongst more
>         normal render times (9-12ms)
>         +1 on giving empty buffers back to the clients to allow them to
>         have a "queue" of empty buffers at their disposal (i'm not sure
>         if RAOF is correct or duflu in that its "synchronously waiting
>         for a round trip every swap"...can we already have an empty
>         buffer queue on the client side ?)
>
>
>         On Wed, Jul 9, 2014 at 4:35 AM, Daniel van Vugt
>         <daniel.van.vugt at canonical.com
>         <mailto:daniel.van.vugt at canonical.com>> wrote:
>
>             Forgive me for rambling on but I just had an important
>             realisation...
>
>             Our current desire to get back to double buffering is only
>             because the Mir protocol is synchronously waiting for a
>             round trip every swap, and somehow I thought that the buffer
>             queue length affected time spent in the ready_to_composite
>             state. Now I'm not so sure that's true.
>
>             If we changed the protocol to implement parallelism, then in
>             theory, keeping triple buffering with a fancy zero-latency
>             swap buffers should perform better than the current protocol
>             that has to wait for a round trip.
>
>             I cannot remember why I thought the length of the buffer
>             queue affected the time from client-rendering to
>             server-compositing. Perhaps we really do need to keep
>             triple-buffering always-on so that the performance gain of a
>             zero-latency client swap-buffers can be achieved...
>
>             In summary, I'm back to thinking any protocol change from
>             next_buffer() needs to support parallelism and not be so
>             synchronous.
>
>             - Daniel
>
>
>
>             On 09/07/14 16:08, Daniel van Vugt wrote:
>
>                 Oops. I keep forgetting that the new BufferQueue
>                 disallows the
>                 compositor to own less than one buffer, so there would
>                 no longer be any
>                 benefit to double buffered clients from a more
>                 concurrent protocol :(
>
>                 Maybe Kevin's suggestion is just fine then. So long as
>                 the server is
>                 able to figure out the surface(Id) from the Buffer struct.
>
>
>                 On 09/07/14 15:41, Daniel van Vugt wrote:
>
>                     Note that we're working on making double-buffering
>                     the default again and
>                     triple the exception. In that case fixing LP:
>                     #1253868 may seem
>                     pointless, but it is surprisingly still relevant.
>                     Because a fully
>                     parallelized design would significantly speed up
>                     double buffering too...
>                     client swap buffers would no longer have to wait for
>                     a round-trip before
>                     returning and would instead be almost instant.
>
>
>                     On 09/07/14 10:00, Daniel van Vugt wrote:
>
>                         Sounds better to just pass buffers around
>                         although I'm not keen on any
>                         change that doesn't make progress on the
>                         performance bottleneck LP:
>                         #1253868. The bottleneck is the
>                         swapping/exchanging approach which
>                         limits the client to holding only one buffer, so
>                         I don't think it's a
>                         good idea for new designs to still have that
>                         problem.
>
>                         In order to improve parallelism per LP: #1253868
>                         you'd really have to
>                         receive new buffers as soon as they're free,
>                         which means getting them as
>                         MirEvents. Then you only need an RPC function to
>                         release them back to
>                         the server:
>
>                              rpc release_buffer(Buffer) returns (Void);
>
>                         Keep in mind the inter-process communication is
>                         the bottleneck here. If
>                         you allow a context switch between the server
>                         and client then that's
>                         half to one millisecond (see mirping) per RPC
>                         round trip. More than
>                         double that for nested servers and you see the
>                         protocol delay could be a
>                         significant factor. So I think any protocol
>                         enhancement should have
>                         parallelism designed in.
>
>                         I also think we need to be careful about not
>                         landing any protocol
>                         changes to RTM candidate series' 0.4-0.5, so the
>                         foundation for RTM is
>                         maximally mature (albeit not yet optimal).
>
>                         - Daniel
>
>
>                         On 08/07/14 21:10, Kevin DuBois wrote:
>
>                             Hello mir team,
>
>                             In order to get the next buffer for the
>                             client, we currently have:
>
>                             rpc next_buffer(SurfaceId) returns (Buffer);
>
>                             which is problematic for me in working on
>                             [1] because this implicitly
>                             releases the buffer from the client side,
>                             whereas in working on that
>                             performance improvement, I have to send a fd
>                             back to the server. So I
>                             was thinking of adding an rpc method more like:
>
>                             rpc exchange_buffer(Buffer) returns (Buffer);
>
>                             This would be sufficient to pass the fd
>                             fence back, and the buffer
>                             id in
>                             the Buffer protocol message would be
>                             sufficient for the server to
>                             figure
>                             out which surface has sent back its buffer.
>                             (given the global buffer
>                             id's we're using)
>
>                             This does not address the problem noted in:
>                             https://bugs.launchpad.net/__mir/+bug/1253868 <https://bugs.launchpad.net/mir/+bug/1253868>
>                             but I think that might be better addressed
>                             by having an exchange type
>                             rpc call (explicit or implicit) and
>                             negotiating/increasing how many
>                             buffers the client owns somehow else.
>
>                             This seems like something that could have
>                             diverse opinions, so I'm
>                             hoping to get some input on the protocol
>                             change here first.
>
>                             Thanks!
>                             Kevin
>
>                             [1]
>                             https://blueprints.launchpad.__net/ubuntu/+spec/client-1410-__mir-performance
>                             <https://blueprints.launchpad.net/ubuntu/+spec/client-1410-mir-performance>
>
>
>                             item:
>                             "[kdub] fencing improvements for clients add
>                             the ipc plumbing"
>
>
>
>
>
>
>             --
>             Mir-devel mailing list
>             Mir-devel at lists.ubuntu.com <mailto:Mir-devel at lists.ubuntu.com>
>             Modify settings or unsubscribe at:
>             https://lists.ubuntu.com/__mailman/listinfo/mir-devel
>             <https://lists.ubuntu.com/mailman/listinfo/mir-devel>
>
>
>
>         --
>         Mir-devel mailing list
>         Mir-devel at lists.ubuntu.com <mailto:Mir-devel at lists.ubuntu.com>
>         Modify settings or unsubscribe at:
>         https://lists.ubuntu.com/mailman/listinfo/mir-devel
>
>
>