<div dir="ltr">I would think that if we have to put environ-uuid into the _id field, then we wouldn't need yet-another field to shard on (at least if we put it at the beginning of the field).<div><br></div><div>John</div>
<div>=:-></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jul 4, 2014 at 2:24 PM, William Reade <span dir="ltr"><<a href="mailto:william.reade@canonical.com" target="_blank">william.reade@canonical.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">My expectation is that:<div><br></div><div>1) We certainly need the environment UUID as a separate field for the shard key.</div>
<div>2) We *also* need the environment UUID as an _id prefix to keep our watchers sane.</div>
<div>2a) If we had separate collections per environment, we wouldn't; but AIUI, scaling mongo by adding collections tends to end badly (I don't have direct experience here myself; but it does indeed seem that we'd start consuming namespaces at a pretty terrifying rate, and I'm inclined to trust those who have done this and failed.)</div>
<div>2b) I'd ordinarily dislike the duplication across the _id and uuid fields, but there's a clear reason for doing so here, so I'm not going to complain. I *will* continue to complain about documents that duplicate info across fields in order to save a few runtime microseconds here and there ;).</div>
<div><br></div><div>If someone with direct experience can chip in reassuringly I *might* be prepared to back off on the N-collections-per-environment thing, but I'm certainly not willing to take it so far as to separate the txn logs and thus discard consistency across environments: I think there will certainly be references between individual hosted environments and the initial environment.</div>
<div><br></div><div>So, in short, I think Tim's (1) is the way to go. But *please* don't duplicate data that doesn't have to be -- the UUID is fine, the name is not. If we really end up spending a lot of time extracting names from _id fields we can cache them in the state documents -- but we don't need redundant copies in the DB, and we *really* don't need to make our lives harder by giving our data unnecessary opportunities for inconsistency.</div>
<div><br></div><div>Cheers</div><span class="HOEnZb"><font color="#888888"><div>William</div><div><br></div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">
On Fri, Jul 4, 2014 at 6:42 AM, John Meinel <span dir="ltr"><<a href="mailto:john@arbash-meinel.com" target="_blank">john@arbash-meinel.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>According to the mongo docs: <a href="http://docs.mongodb.org/manual/core/document/#record-documents" target="_blank">http://docs.mongodb.org/manual/core/document/#record-documents</a><br>
</div><div><li>The field name <tt><span>_id</span></tt> is reserved for use as a primary key; its
value must be unique in the collection, is immutable, and may be of
any type other than an array.</li></div><div class="gmail_extra"><br></div><div class="gmail_extra">That makes it sound like we *could* use an object for the _id field and do _id = {env_uuid:, name:}</div><div class="gmail_extra">
<br></div><div class="gmail_extra">Though I thought the purpose of doing something like that is to allow efficient sharding in a multi-environment world.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Looking here: <a href="http://docs.mongodb.org/manual/core/sharding-shard-key/" target="_blank">http://docs.mongodb.org/manual/core/sharding-shard-key/</a></div>
<div class="gmail_extra">The shard key must be indexed (which is just fine for us w/ the primary _id field or with any other field on the documents), and "The index on the shard key <strong>cannot</strong> be a <em><a href="http://docs.mongodb.org/manual/core/index-multikey/#index-type-multikey" target="_blank">multikey index</a>".</em></div>
<div class="gmail_extra">I don't really know what that means in the case of wanting to shard based on an object instead of a simple string, but it does sound like it might be a problem.</div><div class="gmail_extra">
Anyway, for purposes of being *unique* we may need to put environ uuid in there, but for the purposes of sharding we could just put it on another field and index that field.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">John</div><div class="gmail_extra">=:-></div><div><div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra">
<br><div class="gmail_quote">
On Fri, Jul 4, 2014 at 5:01 AM, Tim Penhey <span dir="ltr"><<a href="mailto:tim.penhey@canonical.com" target="_blank">tim.penhey@canonical.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Hi folks,<br>
<br>
Very shortly we are going to start on the work to be able to store<br>
multiple environments within a single mongo database.<br>
<br>
Most of our current entities are stored in the database with their name<br>
or id fields serialized to bson as the _id field.<br>
<br>
As far as I know (and I may be wrong), if you are adding a document to<br>
the mongo collection, and you do not specify an _id field, mongo will<br>
create a unique value for you.<br>
<br>
In our new world, things that used to be unique, like machines,<br>
services, units etc, are now only unique when paired with the<br>
environment id.<br>
<br>
It seems we have a number of options here.<br>
<br>
1. change the _id field to be a "composed" field where it is the<br>
concatenation of the environment id and the existing id or name field.<br>
If we do take this approach, I strongly recommend having the fields that<br>
make up the key be available by themselves elsewhere in the document<br>
structure.<br>
<br>
2. let mongo create the _id field, and we ensure uniqueness over the<br>
pair of values with a unique index. One think I am unsure about with<br>
this approach is how we currently do our insertion checks, where we do a<br>
"document does not exist" check. We wouldn't be able to do this as a<br>
transaction assertion as it can only check for _id values. How fast are<br>
the indices updated? Can having a unique index for a document work for<br>
us? I'm hoping it can if this is the way to go.<br>
<br>
3. use a composite _id field such that the document may start like this:<br>
{ _id: { env_uuid: "blah", name: "foo"}, ...<br>
This gives the benefit of existence checks, and real names for the _id<br>
parts.<br>
<br>
Thoughts? Opinions? Recommendations?<br>
<br>
BTW, I think that if we can make 3 work, then it is the best approach.<br>
<span><font color="#888888"><br>
Tim<br>
<br>
--<br>
Juju-dev mailing list<br>
<a href="mailto:Juju-dev@lists.ubuntu.com" target="_blank">Juju-dev@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" target="_blank">https://lists.ubuntu.com/mailman/listinfo/juju-dev</a><br>
</font></span></blockquote></div><br></div></div></div></div>
<br>--<br>
Juju-dev mailing list<br>
<a href="mailto:Juju-dev@lists.ubuntu.com" target="_blank">Juju-dev@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" target="_blank">https://lists.ubuntu.com/mailman/listinfo/juju-dev</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br></div>