RFC: state entities, replace globalKey() with .Tag().String()

Dimiter Naydenov dimiter.naydenov at canonical.com
Wed Sep 24 07:39:26 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 24.09.2014 10:09, Tim Penhey wrote:
>> Why? Global keys are a shorter than tags, and in several places
>> we use fast regular expression searches using a prefix based on
>> the global key. So instead of having "m#0#n#juju-public" as
>> global key for a port ranges document, we'll have to use
>> "machine-0<*>network-juju-public", where "<*>" is some
>> unambiguous separator. The same is valid for service settings -
>> "s#wordpress" will become "service-wordpress".
> 
> Sorry, but this is terrible.  The regex searches we have are
> needlessly complicated, and the documents should have real fields
> rather than disassembling the _id field.  I think that having a
> slightly longer value stored in mongo is worth the code when it
> means we go from having two ways to identity an entity down to
> one.

Using compound keys like that allows us to overcome some limitations
of MongoDB/mgo with regards to ensuring integrity, which is otherwise
either impossible or quite hard to do with transactions. If I use the
port ranges document as an example again, the compound key including
the machine id and network name gives us:
 - A way to get all docs for a given machine and any network, using a
simple regex like "m#42#n#.*".
 - No need to add unique indexes to guarantee only a single document
per machine / network (and using unique indexes has other drawbacks -
mgo returning nil and not inserting anything when there's an index
violation, so this means additional checks and more complicated asserts)
 - Using the _id field gives us uniqueness and fast lookup by id,
slightly slower regexp lookup, but still faster than other cases.

A more complicated example is the proposed network interfaces document
structure:
https://docs.google.com/a/canonical.com/document/d/16SYAlZFc19YPXrB7BRwufZVoeLFpqGzBTAdo4EoQIHg/edit#heading=h.pwdo7b7njiz9

There, using an _id field like
"m#<id>#<sha1-hash(<network>#<mac-addr>[#<suffix>])>" gives us both a
way to get all machine NICs easily, but also guarantees there won't be
a chance to have a NIC with the same MAC on the same network and
machine. The same is much harder or impossible to do with asserts on
multiple fields and unique indexes, in a transaction.

I'm not opposed to replacing global keys with tags in state, but using
only simple _id fields in all collections is impractical in certain cases.
- -- 
Dimiter Naydenov <dimiter.naydenov at canonical.com>
juju-core team
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUInUuAAoJENzxV2TbLzHww5MH/A0foVm/+dYfHWLNsEyi//DN
7QtkkJxmu79JYBzG15fCIrrBDa6Edx0VCIYeEvsQmRRnDJUH+H4IWtlvmssxaxw2
WWoOVuDgCn5oKbEE0NKSbYq3dbk2q4VUryPml+0n79KZxZQrI9Xry6W/o2pm0BQc
LIEU5RjxgD1YXV/B+0cvp9zpKmwm9/Pi6VsXF5O8sewINh0INr0HEMOYPt+LLsec
yIMcdd7ujIxL/hU1IOjtLkwBaPSXSxcbK5UUzO0aG2KNswfxCXO7X99kpFlg7z29
xqdoW7UCEkzoWrSCHWmkiTyCYa1zPApHEBd/tA/K34BV+XEDFMolFi9b8GmhliA=
=sX4m
-----END PGP SIGNATURE-----



More information about the Juju-dev mailing list