Automatic multi-environment collection handling

Menno Smits menno.smits at canonical.com
Thu Dec 18 03:28:00 UTC 2014


I've landed several big changes recently which automate handling of
multi-environment concerns when accessing Juju's collections. These both
simplify DB queries and updates as well as reducing the risk of unintended
data leakage between environments. Although in most cases you won't even
know that anything has changed, it's worth understanding what's been done.

*Collections*

MongoDB queries against collections which contain data for multiple
environments are now automatically modified to ensure they return records
for only the environment tied to State being queried against. Queries
against collections which do not contain data for multiple environments
pass through untouched.

Some examples ...

machines.FindId("2")
    becomes
machines.FindId("<uuid>:2").One(&doc)

machines.Find(bson.D{{"series", "trusty"}}}
    becomes
machines.Find(bson.D{{"series", "trusty"}, {"env-uuid", "<uuid>"}})

machines.Find(bson.D{{"_id", "4"}}}
    becomes
machines.Find(bson.D{{"_id", "<uuid>:4"}, {"env-uuid", "<uuid>"}})

Where "<uuid>" is the environment UUID of the State instance the collection
was obtained from (using getCollection()).

The Remove, RemoveId and RemoveAll methods on collections also have similar
handling and the collection Count method returns only the number of records
in the collection for a single environment.

The main benefit of this is that you don't need to remember to wrap ids in
State.docID() calls or remember to add the "env-uuid" field to queries. In
fact, I recommend you leave them out to reduce noise from code that does DB
queries.

There are some limited cases where you might really need to query across
multiple environments or don't want the automatic munging in place for some
reason. For these scenarios you can get hold of a *mgo.Collection by
calling State.getRawCollection(). This is currently only being used by a
few database migration steps.

Note that query selectors using MongoDB operators with the _id field will
be left untouched. In these cases you need to know that there's a UUID
prefix on the _id and handle it yourself. For example, to query all the
machines with ids starting with "4" you might consider doing:

machines.Find(bson.D{{"_id", bson.D{"$regex": "^4.*"}}}}
    which is transformed to:
machines.Find(bson.D{{"_id", bson.D{"$regex": "^4.*"}}}}, {"env-uuid",
"<uuid>"}})

Note how the _id selector is left alone but the env-uuid selector is still
added. It's left up to the developer to account for the environment UUID in
_id regex (the regex above won't work as is).


*Transactions*

Changes have also been made for automatically modifying transaction
operations to account for multi-environment collections.

For example:

st.runTransaction([]txn.Op{{
    C: machinesC,
    Id: "1"
    Remove: true,
}, {
    C: machinesC,
    Id: "2",
    Insert: bson.D{
        {"series", "trusty"},
    },
}, {
    C: machinesC,
    Id: "3",
    Insert: &machineDoc{
        Series: "trusty",
    },
}, {
    C: otherC,
    Id: "foo",
    Insert: bson.D{},
}})

    automatically becomes:

st.runTransaction([]txn.Op{{
    C: machinesC,
    Id: "<uuid>:1",
    Remove: true,
}, {
    C: machinesC,
    Id: "<uuid>:2",
    Insert: bson.D{
        {"_id", "<uuid>:2"},
        {"env-uuid", "<uuid>"},
        {"series", "trusty"},
    },
}, {
    C: machinesC,
    Id: "<uuid>:3",
    Insert: &machineDoc{
        DocID: "<uuid>:3",
        EnvUUID: "<uuid>",
        Series: "trusty",
    }
}, {
    C: otherC,
    Id: "foo",
    Insert: bson.D{},
}})

Note how the environment UUID is prefixed onto ids for operations for
multi-environment collections. Also see how the _id and env-uuid field on
documents defined using bson.D or structs (bson.M supported too) are
automatically populated. A panic will occur if you provide the environment
UUID but it doesn't match what was expected as this indicates a likely bug.

Any document updates are made in place so that the caller sees them once
the transaction completes. This makes it safe for the caller to a document
struct used with a transaction operation for further work - the struct will
match what was written to the DB. Note that if a struct is passed by value
and needs updating, a panic will occur. This won't normally be a problem as
we tend to use pointers to document structs with transaction operations,
and the panic is a helpful indication that the document provided isn't
multi-environment safe.

Note that only the Id and Insert fields of txn.Op are touched. The Update
and Assert fields are left alone.

In some cases you may need to run a transaction without invoking automatic
multi-environment munging. State now has a rawTxnRunner() and
runRawTransaction() methods for the rare situations where this is
necessary. Please use these sparingly.

*Performance*

With the extra work being done to implement automatic multi-environment
support, the performance impact was a concern. I have compared multiple
runs of the state package unit tests, with and without these changes and
the difference is lost in the noise.

If you see any problems or have any questions, please let me know.

- Menno
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20141218/3357cadf/attachment.html>


More information about the Juju-dev mailing list