A few issues

Wed Apr 10 20:59:37 UTC 2013

On Wed, 2013-04-10 at 15:04 -0300, Gustavo Niemeyer wrote:
> 1) Agent version shows "pending", when in fact the agent is running
> install hooks as confirmed by logging into the machine and running ps.

I believe the agent-state always stayed "pending" until the install hook
either succeeded or failed; a quick look at the python seems to confirm
this supposition. More sophistication here would be kinda cool, but out
of scope for the moment.

> 2) There are two different agent versions in use, although this
> environment was bootstrapped with --upload-tools from tip.

Is it possible that you did --upload-tools from quantal and didn't
--fake-series precise? There are some surprising aspects to cros-bucket
tools fallback behaviour wrt dev tools that, I think, make that the most
likely explanation; but I've been focused on rationalising the tools
selection for a few days now, and I expect this to bear fruit
imminently.

> 3) Machine ids are still strings, incompatible with Python. Note that
> this is the case in multiple places in the output, not just under
> machines:.

This is somewhat harder to address; my initial grasp on "this is
definitely wrong" has weakened. There are several reasons it's
convenient to externally identify machines by string rather than int,
but there's one that seems to me to be overwhelming: and that reason is
JSON.

While the project as a whole has a policy of using YAML for its own
data, and I have no intention of messing with this, interoperability
demands that the data we produce be sanely expressible in a less
sophisticated format. JSON support is a requirement and a reality in
several contexts: in the API, in hook tool output, and (as here) in CLI
output.

In every other case we're outputting data that *can* be sanely
represented as JSON, but in this case we're not (because, for those
following along at home, you can't use ints as keys in JSON dicts). The
original solution was honestly pretty ugly -- it was a last-minute
special-case tweak that replaced the top-level machines map such that
all the int keys were strings.

Furthermore, it was inconsistent, because the values identifying
machines elsewhere in the output are no longer valid keys into the
machines map; I don't think that's an acceptable solution. That could be
worked around -- we could encode more (duplicated) knowledge of the
structure of the output into the last-minute map-tweaker, and change
every machine id in place -- but this is obviously very fragile. And
besides, it's crazy: it's actively promoting two different
representations of machine identity in a single context.

The alternative is to change our "official" external representation of
machine ids. By changing to strings we inconvenience users of status
output right now; but we and they all get to share the immediate and
ongoing fruits of a consistent data representation that usefully
serializes to a lowest-common-denominator format.

I am aware that this is not a solution that will please you, and this
fact notably fails to please me: you have frequently demonstrated
uncommonly clear-sighted long-term vision and I'm loath to disregard
your opinion. At present, though, my best judgment is that the global
cost of indefinitely supporting inconsistent or incorrect output
outweighs the global cost of fixing it now. I rue the immediate
inconvenience to end users, but I think it'll make everyone's lives
easier in the long run.

The biggest immediate issue here is naming: I contend that the cheapest
way out of this tangle with a sane path forwards is to mechanically
change references to machine "ids" into references to "names", and
thereby (1) get the benefits of sane LCD representation of juju data as
above while (2) clearing the internal namespace such that we have a free
hand to implement an internal int id field across the board (as I think
there is broad agreement that we must, for a variety of compelling
reasons).

As always, there may be important considerations or consequences that I
have failed to take into account; if so I would be most keen to hear
about them, because I'd hate to take a decision like this from a
position of ignorance. I'm particularly keen to hear from those
programmatically consuming status output, because they're the ones who
bear the immediate cost.

Cheers
William