API Bulk Operations

Thu May 30 08:11:40 UTC 2013

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Agree with John's points.

On 30/05/13 02:12, Dimiter Naydenov wrote:
> Hi guys,
> 
> I decided to summarize what we discussed lately about API bulk
> operations support.
> 
> 1. Why do we need this?
> 
> We'll need bulk operations mostly for clients like the GUI or CLI
> commands. Good example is "juju status" - having to issue one multiple
> requests per entity is a bad design and won't be usable with large
> environments. Now we get all machines, for each one we get the status,
> agent alive, constraints, etc. Then all services, each unit, along
> with their status, etc. These are a lot of round-trips and considering
> a large environment (100 or more machines, services, units, relations)
> will take a very long time.
> 
> Currently all agents are written with the presumption they'll operate
> on single entities (e.g. machine agent, machiner) or they get a list
> of entities and operate on each one separately (e.g. provisioner,
> firewaller). Each state call translated to an equivalent API call
> requires a round-trip. This is inefficient and the agents can be
> refactored to handle entities in bulk operations, like "get all
> machines", "get the status of all these machines", "get the
> constraints for these machines", as 3 separate operations.
> 

The core issue it that we must, by design, minimise the number of round trips
between client and server, and eliminate the O(n) increase in number of API
calls that would occur and the number of entities increases.

> 2. Implementing support for bulk operations
> 
> Currently there are ways to support bulk operations on the
> server-side, without changing the client-side of the API. So instead
> of having entity-level entry points, like Unit, which needs an id
> (name) and proxies state.Unit operations, we can have top-level
> "services", like Units, which proxies state.Unit operations in bulk,
> taking a list of ids (names) for each operation. In turn the
> client-side will still have entity-level entry points, like Unit,
> having more or less the same interface like state.Unit, but internally
> will call the server-side Units "service" passing its id (name),
> effectively operating on a single entity.
>

We need to ensure we adopt a SOA where the decomposition of services is along
the lines of logically grouped, coarse grained, business operations (defined as
verbs). This might result in services along the lines of MachineService,
UnitService etc, but then again it doesn't have to and often doesn't in practice.

Bikeshed warning -  I'm -1 on naming the services Units, Machines etc.

Another common, and usually critical for scalability, design principal -
services should be stateless by design. I realise that doesn't match what we
have done to date. But I wanted to mention it because I fear for our system's
scalability if we don't adopt that approach.

> 3. What about error handling with bulk operations?
> 
> This is a non-trivial issue, so needs to be defined. Consider the
> following operations:
> 
> A. Get([id1, id2, ...]) -> []Result, error
> B. Set([[id1, params1], [id2, params2], ...]) -> []error
> C. Create([params1, params2, ...]) -> []Result, error
> 
> A. Covers most read operations, like Machines.Get(ids),
> Machines.Status(ids), etc. We might get an error from state while
> getting some or all entities. We might have some more serious issue
> with the operation itself (i.e. state connection dropped at the
> beginning or half way in the operation). In the first case the final
> error is nil, because the operation succeeded, if partially, and
> returned a result. In the second case there's a non-nil error and no
> results. The results themselves have to be defined like this:
> 
> type Result struct {
> Error
> // Other fields
> }
> Each result can contain an error for this specific result.
> The results have to be in the same order as the passed ids, so the
> client can easily and quickly lookup the result by index.
> 
> B. Covers most update operations: "do this on a bunch of things and
> tell me if each one was successful". Again, the returned errors match
> in order and count the passed id/params pairs.
> 
> 
> C. Covers a few operations, like AddMachine, AddUnit,
> unit.AssignToMachine. Same semantics as for A.: Result contains both a
> potential error and any other fields as needed; error is not nil only
> when the bulk operation cannot be performed as a whole.
> 

Error handling needs thought, agreed. The required semantics generally depend on
the type of operation and whether we indeed that, at an application API level,
transactionality should apply.

> 4. If the agents are not using bulk operations, why should we care?
> 
> Because supporting bulk operations on the API server-side requires
> changes to the protocol exposed over the websocket, and even though we
> can mask this for juju-core by keeping the client-side API the same,
> we can't for other potential clients, like the GUI.
> So we need to think how are we are going to implement it now, even if
> we won't need it immediately.
>

Now is the time to get this right. Even if there's a little up front pain. It
will be much worse trying to fix a broken system later.

> 5. Can we have both bulk and single operations in the API?
> 
> I don't see why all the operations *have* to be bulk. There are lots
> of examples of APIs like that out there, used a lot and their users
> are happy. Openstack comes to mind, also AWS.
> But having bulk operations in the first place brings up all the
> aforementioned things, which need to be resolved.
> 

It depends. Usually most operations can be designed to be bulk, and usually
there's not a good argument to make them single.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iJwEAQECAAYFAlGnCbYACgkQCJ79BCOJFcZ6igQAp8u3KQ48k2e2tx4moPQGKSu5
OCmzuZx5zANFB4I4eXoqI8G+v24aegOarIwJFOPPP8giUOdzSQDQDSr/lfJw6W7n
F0VozV4RtbkhMxxmjXpbUIkmBkPYABxash/bmHl8n6clpXc65PUeJ3z8lOuAoa+Y
l+N0heReFVOEpkBu0gU=
=na/b
-----END PGP SIGNATURE-----