Using subdocument _id fields for multi-environment support

Menno Smits menno.smits at canonical.com
Wed Oct 1 04:25:07 UTC 2014


Team Onyx has been busy preparing for multi-environment state server
support. One piece of this is updating almost all of Juju's collections to
include the environment UUID in document identifiers so that data for
multiple environments can co-exist in the same collection even when they
otherwise have same identifier (machine id, service name, unit name etc).

Based on discussions on juju-dev a while back[1] we have started this doing
this by prepending the environment UUID to the _id field and adding extra
fields which provide the environment UUID and old _id value separately for
easier querying and handling.

So far, services and units have been migrated. Where previously a service
document looked like this:

    type serviceDoc struct {
         Name          string `bson:"_id"`
         Series        string
         ...

it nows looks like this:

    type serviceDoc struct {
         DocID         string `bson:"_id"`       // "<env uuid>:wordpress/0"
         Name          string `bson:"name"`      // "wordpress/0"
         EnvUUID       string `bson:"env-uuid"`  // "<env uuid>"
         Series        string
         ...

Unit documents have undergone a similar transformation.

This approach works but has a few downsides:

   - it's possible for the local id ("Name" in this case) and EnvUUID
   fields to become out of sync with the corresponding values the make up the
   _id. If that ever happens very bad things could occur.
   - it somewhat unnecessarily increases the document size, requiring that
   we effectively store some values twice
   - it requires slightly awkward transformations between UUID prefixed and
   unprefixed IDs throughout the code

MongoDB allows the _id field to be a subdocument so Tim asked me to
experiment with this to see if it might be a cleaner way to approach the
multi-environment conversion before we update any more collections. The
code for these experiments can be found here:
https://gist.github.com/mjs/2959bb3e90a8d4e7db50 (I've included the output
as a comment on the gist).

What I've found suggests that using a subdocument for the _id is a better
way forward. This approach means that each field value is only stored once
so there's no chance of the document key being out of sync with other
fields and there's no unnecessary redundancy in the amount of data being
stored. The fields in the _id subdocument are easy to access individually
and can be queried separately if required. It is also possible to create
indexes on specific fields in the _id subdocument if necessary for
performance reasons.

Using this approach, a service document would end up looking something like
this:

    type serviceDoc struct {
         ID            serviceId `bson:"_id"`
         Series        string
         ...
    }

    type serviceId struct {
  EnvUUID string `bson:"env-uuid"`
  Name    string
    }

There was some concern in the original email thread about whether
subdocument style _id fields would work with sharding. My research and
experiments suggest that there is no issue here. There are a few types of
indexes that can't be used with sharding, primarily "multikey" indexes, but
I can't see us using these for _id values. A multikey index is used by
MongoDB when a field used as part of an index is an array - it's highly
unlikely that we're going to use arrays in _id fields.

Hashed indexes are a good basis for well-balanced shards according to the
MongoDB docs so I wanted to be sure that it's OK to create a hashed index
for subdocument style fields. It turns out there's no issue here (see
TestHashedIndex in the gist).

Using subdocuments for _id fields is not going to prevent us from using
MongoDB's sharding features in the future if we need to.

Apart from having to rework the changes already made to the services and
units collections[2], I don't see any downsides to this approach. Can
anyone think of something I might be overlooking?

- Menno


[1] - subject was "RFC: mongo "_id" fields in the multi-environment juju
server world"

[2] - this work will have to be done before 1.21 has a stable release
because the units and services changes have already landed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20141001/86f3cb1e/attachment.html>


More information about the Juju-dev mailing list