RFC: state entities, replace globalKey() with .Tag().String()
Dimiter Naydenov
dimiter.naydenov at canonical.com
Wed Sep 24 07:39:26 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 24.09.2014 10:09, Tim Penhey wrote:
>> Why? Global keys are a shorter than tags, and in several places
>> we use fast regular expression searches using a prefix based on
>> the global key. So instead of having "m#0#n#juju-public" as
>> global key for a port ranges document, we'll have to use
>> "machine-0<*>network-juju-public", where "<*>" is some
>> unambiguous separator. The same is valid for service settings -
>> "s#wordpress" will become "service-wordpress".
>
> Sorry, but this is terrible. The regex searches we have are
> needlessly complicated, and the documents should have real fields
> rather than disassembling the _id field. I think that having a
> slightly longer value stored in mongo is worth the code when it
> means we go from having two ways to identity an entity down to
> one.
Using compound keys like that allows us to overcome some limitations
of MongoDB/mgo with regards to ensuring integrity, which is otherwise
either impossible or quite hard to do with transactions. If I use the
port ranges document as an example again, the compound key including
the machine id and network name gives us:
- A way to get all docs for a given machine and any network, using a
simple regex like "m#42#n#.*".
- No need to add unique indexes to guarantee only a single document
per machine / network (and using unique indexes has other drawbacks -
mgo returning nil and not inserting anything when there's an index
violation, so this means additional checks and more complicated asserts)
- Using the _id field gives us uniqueness and fast lookup by id,
slightly slower regexp lookup, but still faster than other cases.
A more complicated example is the proposed network interfaces document
structure:
https://docs.google.com/a/canonical.com/document/d/16SYAlZFc19YPXrB7BRwufZVoeLFpqGzBTAdo4EoQIHg/edit#heading=h.pwdo7b7njiz9
There, using an _id field like
"m#<id>#<sha1-hash(<network>#<mac-addr>[#<suffix>])>" gives us both a
way to get all machine NICs easily, but also guarantees there won't be
a chance to have a NIC with the same MAC on the same network and
machine. The same is much harder or impossible to do with asserts on
multiple fields and unique indexes, in a transaction.
I'm not opposed to replacing global keys with tags in state, but using
only simple _id fields in all collections is impractical in certain cases.
- --
Dimiter Naydenov <dimiter.naydenov at canonical.com>
juju-core team
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJUInUuAAoJENzxV2TbLzHww5MH/A0foVm/+dYfHWLNsEyi//DN
7QtkkJxmu79JYBzG15fCIrrBDa6Edx0VCIYeEvsQmRRnDJUH+H4IWtlvmssxaxw2
WWoOVuDgCn5oKbEE0NKSbYq3dbk2q4VUryPml+0n79KZxZQrI9Xry6W/o2pm0BQc
LIEU5RjxgD1YXV/B+0cvp9zpKmwm9/Pi6VsXF5O8sewINh0INr0HEMOYPt+LLsec
yIMcdd7ujIxL/hU1IOjtLkwBaPSXSxcbK5UUzO0aG2KNswfxCXO7X99kpFlg7z29
xqdoW7UCEkzoWrSCHWmkiTyCYa1zPApHEBd/tA/K34BV+XEDFMolFi9b8GmhliA=
=sX4m
-----END PGP SIGNATURE-----
More information about the Juju-dev
mailing list