Using subdocument _id fields for multi-environment support

Wed Oct 1 13:04:25 UTC 2014

I'm very keen on this. Thanks Menno (and Tim); unless anyone comes up
with substantial objections, let's go with this.

Cheers
William

On Wed, Oct 1, 2014 at 6:25 AM, Menno Smits <menno.smits at canonical.com> wrote:
> Team Onyx has been busy preparing for multi-environment state server
> support. One piece of this is updating almost all of Juju's collections to
> include the environment UUID in document identifiers so that data for
> multiple environments can co-exist in the same collection even when they
> otherwise have same identifier (machine id, service name, unit name etc).
>
> Based on discussions on juju-dev a while back[1] we have started this doing
> this by prepending the environment UUID to the _id field and adding extra
> fields which provide the environment UUID and old _id value separately for
> easier querying and handling.
>
> So far, services and units have been migrated. Where previously a service
> document looked like this:
>
>     type serviceDoc struct {
>          Name          string `bson:"_id"`
>          Series        string
>          ...
>
> it nows looks like this:
>
>     type serviceDoc struct {
>          DocID         string `bson:"_id"`       // "<env uuid>:wordpress/0"
>          Name          string `bson:"name"`      // "wordpress/0"
>          EnvUUID       string `bson:"env-uuid"`  // "<env uuid>"
>          Series        string
>          ...
>
> Unit documents have undergone a similar transformation.
>
> This approach works but has a few downsides:
>
> it's possible for the local id ("Name" in this case) and EnvUUID fields to
> become out of sync with the corresponding values the make up the _id. If
> that ever happens very bad things could occur.
> it somewhat unnecessarily increases the document size, requiring that we
> effectively store some values twice
> it requires slightly awkward transformations between UUID prefixed and
> unprefixed IDs throughout the code
>
> MongoDB allows the _id field to be a subdocument so Tim asked me to
> experiment with this to see if it might be a cleaner way to approach the
> multi-environment conversion before we update any more collections. The code
> for these experiments can be found here:
> https://gist.github.com/mjs/2959bb3e90a8d4e7db50 (I've included the output
> as a comment on the gist).
>
> What I've found suggests that using a subdocument for the _id is a better
> way forward. This approach means that each field value is only stored once
> so there's no chance of the document key being out of sync with other fields
> and there's no unnecessary redundancy in the amount of data being stored.
> The fields in the _id subdocument are easy to access individually and can be
> queried separately if required. It is also possible to create indexes on
> specific fields in the _id subdocument if necessary for performance reasons.
>
> Using this approach, a service document would end up looking something like
> this:
>
>     type serviceDoc struct {
>          ID            serviceId `bson:"_id"`
>          Series        string
>          ...
>     }
>
>     type serviceId struct {
>   EnvUUID string `bson:"env-uuid"`
>   Name    string
>     }
>
> There was some concern in the original email thread about whether
> subdocument style _id fields would work with sharding. My research and
> experiments suggest that there is no issue here. There are a few types of
> indexes that can't be used with sharding, primarily "multikey" indexes, but
> I can't see us using these for _id values. A multikey index is used by
> MongoDB when a field used as part of an index is an array - it's highly
> unlikely that we're going to use arrays in _id fields.
>
> Hashed indexes are a good basis for well-balanced shards according to the
> MongoDB docs so I wanted to be sure that it's OK to create a hashed index
> for subdocument style fields. It turns out there's no issue here (see
> TestHashedIndex in the gist).
>
> Using subdocuments for _id fields is not going to prevent us from using
> MongoDB's sharding features in the future if we need to.
>
> Apart from having to rework the changes already made to the services and
> units collections[2], I don't see any downsides to this approach. Can anyone
> think of something I might be overlooking?
>
> - Menno
>
>
> [1] - subject was "RFC: mongo "_id" fields in the multi-environment juju
> server world"
>
> [2] - this work will have to be done before 1.21 has a stable release
> because the units and services changes have already landed.
>
>
>
> --
> Juju-dev mailing list
> Juju-dev at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>