<div dir="ltr">I would think that if we have to put environ-uuid into the _id field, then we wouldn't need yet-another field to shard on (at least if we put it at the beginning of the field).<div><br></div><div>John</div> <div>=:-></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jul 4, 2014 at 2:24 PM, William Reade <span dir="ltr"><<a href="mailto:william.reade@canonical.com" target="_blank">william.reade@canonical.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">My expectation is that:<div><br></div><div>1) We certainly need the environment UUID as a separate field for the shard key.</div> <div>2) We *also* need the environment UUID as an _id prefix to keep our watchers sane.</div> <div>2a) If we had separate collections per environment, we wouldn't; but AIUI, scaling mongo by adding collections tends to end badly (I don't have direct experience here myself; but it does indeed seem that we'd start consuming namespaces at a pretty terrifying rate, and I'm inclined to trust those who have done this and failed.)</div> <div>2b) I'd ordinarily dislike the duplication across the _id and uuid fields, but there's a clear reason for doing so here, so I'm not going to complain. I *will* continue to complain about documents that duplicate info across fields in order to save a few runtime microseconds here and there ;).</div> <div><br></div><div>If someone with direct experience can chip in reassuringly I *might* be prepared to back off on the N-collections-per-environment thing, but I'm certainly not willing to take it so far as to separate the txn logs and thus discard consistency across environments: I think there will certainly be references between individual hosted environments and the initial environment.</div> <div><br></div><div>So, in short, I think Tim's (1) is the way to go. But *please* don't duplicate data that doesn't have to be -- the UUID is fine, the name is not. If we really end up spending a lot of time extracting names from _id fields we can cache them in the state documents -- but we don't need redundant copies in the DB, and we *really* don't need to make our lives harder by giving our data unnecessary opportunities for inconsistency.</div> <div><br></div><div>Cheers</div><span class="HOEnZb"><font color="#888888"><div>William</div><div><br></div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote"> On Fri, Jul 4, 2014 at 6:42 AM, John Meinel <span dir="ltr"><<a href="mailto:john@arbash-meinel.com" target="_blank">john@arbash-meinel.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>According to the mongo docs: <a href="http://docs.mongodb.org/manual/core/document/#record-documents" target="_blank">http://docs.mongodb.org/manual/core/document/#record-documents</a><br> </div><div><li>The field name <tt><span>_id</span></tt> is reserved for use as a primary key; its value must be unique in the collection, is immutable, and may be of any type other than an array.</li></div><div class="gmail_extra"><br></div><div class="gmail_extra">That makes it sound like we *could* use an object for the _id field and do _id = {env_uuid:, name:}</div><div class="gmail_extra"> <br></div><div class="gmail_extra">Though I thought the purpose of doing something like that is to allow efficient sharding in a multi-environment world.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Looking here: <a href="http://docs.mongodb.org/manual/core/sharding-shard-key/" target="_blank">http://docs.mongodb.org/manual/core/sharding-shard-key/</a></div> <div class="gmail_extra">The shard key must be indexed (which is just fine for us w/ the primary _id field or with any other field on the documents), and "The index on the shard key <strong>cannot</strong> be a <em><a href="http://docs.mongodb.org/manual/core/index-multikey/#index-type-multikey" target="_blank">multikey index</a>".</em></div> <div class="gmail_extra">I don't really know what that means in the case of wanting to shard based on an object instead of a simple string, but it does sound like it might be a problem.</div><div class="gmail_extra"> Anyway, for purposes of being *unique* we may need to put environ uuid in there, but for the purposes of sharding we could just put it on another field and index that field.</div> <div class="gmail_extra"><br></div><div class="gmail_extra">John</div><div class="gmail_extra">=:-></div><div><div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"> <br><div class="gmail_quote"> On Fri, Jul 4, 2014 at 5:01 AM, Tim Penhey <span dir="ltr"><<a href="mailto:tim.penhey@canonical.com" target="_blank">tim.penhey@canonical.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> Hi folks,<br> <br> Very shortly we are going to start on the work to be able to store<br> multiple environments within a single mongo database.<br> <br> Most of our current entities are stored in the database with their name<br> or id fields serialized to bson as the _id field.<br> <br> As far as I know (and I may be wrong), if you are adding a document to<br> the mongo collection, and you do not specify an _id field, mongo will<br> create a unique value for you.<br> <br> In our new world, things that used to be unique, like machines,<br> services, units etc, are now only unique when paired with the<br> environment id.<br> <br> It seems we have a number of options here.<br> <br> 1. change the _id field to be a "composed" field where it is the<br> concatenation of the environment id and the existing id or name field.<br> If we do take this approach, I strongly recommend having the fields that<br> make up the key be available by themselves elsewhere in the document<br> structure.<br> <br> 2. let mongo create the _id field, and we ensure uniqueness over the<br> pair of values with a unique index. One think I am unsure about with<br> this approach is how we currently do our insertion checks, where we do a<br> "document does not exist" check. We wouldn't be able to do this as a<br> transaction assertion as it can only check for _id values. How fast are<br> the indices updated? Can having a unique index for a document work for<br> us? I'm hoping it can if this is the way to go.<br> <br> 3. use a composite _id field such that the document may start like this:<br> { _id: { env_uuid: "blah", name: "foo"}, ...<br> This gives the benefit of existence checks, and real names for the _id<br> parts.<br> <br> Thoughts? Opinions? Recommendations?<br> <br> BTW, I think that if we can make 3 work, then it is the best approach.<br> <span><font color="#888888"><br> Tim<br> <br> --<br> Juju-dev mailing list<br> <a href="mailto:Juju-dev@lists.ubuntu.com" target="_blank">Juju-dev@lists.ubuntu.com</a><br> Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" target="_blank">https://lists.ubuntu.com/mailman/listinfo/juju-dev</a><br> </font></span></blockquote></div><br></div></div></div></div> <br>--<br> Juju-dev mailing list<br> <a href="mailto:Juju-dev@lists.ubuntu.com" target="_blank">Juju-dev@lists.ubuntu.com</a><br> Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" target="_blank">https://lists.ubuntu.com/mailman/listinfo/juju-dev</a><br> <br></blockquote></div><br></div> </div></div></blockquote></div><br></div>