RFC: mongo "_id" fields in the multi-environment juju server world
John Meinel
john at arbash-meinel.com
Fri Jul 4 11:58:08 UTC 2014
I would think that if we have to put environ-uuid into the _id field, then
we wouldn't need yet-another field to shard on (at least if we put it at
the beginning of the field).
John
=:->
On Fri, Jul 4, 2014 at 2:24 PM, William Reade <william.reade at canonical.com>
wrote:
> My expectation is that:
>
> 1) We certainly need the environment UUID as a separate field for the
> shard key.
> 2) We *also* need the environment UUID as an _id prefix to keep our
> watchers sane.
> 2a) If we had separate collections per environment, we wouldn't; but AIUI,
> scaling mongo by adding collections tends to end badly (I don't have direct
> experience here myself; but it does indeed seem that we'd start consuming
> namespaces at a pretty terrifying rate, and I'm inclined to trust those who
> have done this and failed.)
> 2b) I'd ordinarily dislike the duplication across the _id and uuid fields,
> but there's a clear reason for doing so here, so I'm not going to complain.
> I *will* continue to complain about documents that duplicate info across
> fields in order to save a few runtime microseconds here and there ;).
>
> If someone with direct experience can chip in reassuringly I *might* be
> prepared to back off on the N-collections-per-environment thing, but I'm
> certainly not willing to take it so far as to separate the txn logs and
> thus discard consistency across environments: I think there will certainly
> be references between individual hosted environments and the initial
> environment.
>
> So, in short, I think Tim's (1) is the way to go. But *please* don't
> duplicate data that doesn't have to be -- the UUID is fine, the name is
> not. If we really end up spending a lot of time extracting names from _id
> fields we can cache them in the state documents -- but we don't need
> redundant copies in the DB, and we *really* don't need to make our lives
> harder by giving our data unnecessary opportunities for inconsistency.
>
> Cheers
> William
>
>
>
> On Fri, Jul 4, 2014 at 6:42 AM, John Meinel <john at arbash-meinel.com>
> wrote:
>
>> According to the mongo docs:
>> http://docs.mongodb.org/manual/core/document/#record-documents
>> The field name _id is reserved for use as a primary key; its value must
>> be unique in the collection, is immutable, and may be of any type other
>> than an array.
>>
>> That makes it sound like we *could* use an object for the _id field and
>> do _id = {env_uuid:, name:}
>>
>> Though I thought the purpose of doing something like that is to allow
>> efficient sharding in a multi-environment world.
>>
>> Looking here: http://docs.mongodb.org/manual/core/sharding-shard-key/
>> The shard key must be indexed (which is just fine for us w/ the primary
>> _id field or with any other field on the documents), and "The index on the
>> shard key *cannot* be a *multikey index
>> <http://docs.mongodb.org/manual/core/index-multikey/#index-type-multikey>".*
>> I don't really know what that means in the case of wanting to shard based
>> on an object instead of a simple string, but it does sound like it might be
>> a problem.
>> Anyway, for purposes of being *unique* we may need to put environ uuid in
>> there, but for the purposes of sharding we could just put it on another
>> field and index that field.
>>
>> John
>> =:->
>>
>>
>>
>> On Fri, Jul 4, 2014 at 5:01 AM, Tim Penhey <tim.penhey at canonical.com>
>> wrote:
>>
>>> Hi folks,
>>>
>>> Very shortly we are going to start on the work to be able to store
>>> multiple environments within a single mongo database.
>>>
>>> Most of our current entities are stored in the database with their name
>>> or id fields serialized to bson as the _id field.
>>>
>>> As far as I know (and I may be wrong), if you are adding a document to
>>> the mongo collection, and you do not specify an _id field, mongo will
>>> create a unique value for you.
>>>
>>> In our new world, things that used to be unique, like machines,
>>> services, units etc, are now only unique when paired with the
>>> environment id.
>>>
>>> It seems we have a number of options here.
>>>
>>> 1. change the _id field to be a "composed" field where it is the
>>> concatenation of the environment id and the existing id or name field.
>>> If we do take this approach, I strongly recommend having the fields that
>>> make up the key be available by themselves elsewhere in the document
>>> structure.
>>>
>>> 2. let mongo create the _id field, and we ensure uniqueness over the
>>> pair of values with a unique index. One think I am unsure about with
>>> this approach is how we currently do our insertion checks, where we do a
>>> "document does not exist" check. We wouldn't be able to do this as a
>>> transaction assertion as it can only check for _id values. How fast are
>>> the indices updated? Can having a unique index for a document work for
>>> us? I'm hoping it can if this is the way to go.
>>>
>>> 3. use a composite _id field such that the document may start like this:
>>> { _id: { env_uuid: "blah", name: "foo"}, ...
>>> This gives the benefit of existence checks, and real names for the _id
>>> parts.
>>>
>>> Thoughts? Opinions? Recommendations?
>>>
>>> BTW, I think that if we can make 3 work, then it is the best approach.
>>>
>>> Tim
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev at lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev at lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20140704/470d3202/attachment.html>
More information about the Juju-dev
mailing list