thoughts on priorities

Thu May 2 16:32:13 UTC 2013

On 2 May 2013 14:59, William Reade <william.reade at canonical.com> wrote:
> On Thu, 2013-05-02 at 11:40 +0100, roger peppe wrote:
>> Not at all. But I think a cheap (and sufficient and not inelegant) solution
>> is just to move the version into another document and have a watcher
>> for that, meaning that almost all our logic remains unchanged and
>> we have more time for implementing other stuff.
>
> Well, I'm open to discussion... but my instinct argues against it at the
> moment. ISTM that the bulk of the logic under threat *is* the dance from
> config-watcher to environment to tools-list to version-picking. I'm not
> sure how changing the environment config data in order to maintain a
> single aspect of that is going to be cheaper than recording machine arch
> (which can be done over the API before we start watching the machine
> config over the API, so we can guarantee that field'll be filled in
> before watching, without demanding a major version change), and wrapping
> the existing watcher so that we only deliver the tools we know we need
> (series is already known to the machine).

It seems like it'll be cheaper to me. Currently we watch the environment config,
list the tools, select the tools and then download the ones we need.

I think all that logic can remain the same except that instead of watching the
environment config, we'd watch another config (we could use the existing
config watcher) and change environs.FindExactTools so it could work on
the tools listed in the state too (or, more probably, provide a
FindTools API call)

Adding another watcher just for watching tools seems unnecessary to me.

>> This is something that every agent will be watching. If every agent
>> is watching for exactly the same data (the global version changing)
>> then we're doing less work on the API hub and we can potentially
>> use less resources by caching the information there.
>
> If having a machine available implies having series/arch available, I
> think it's probably smarter to just send the necessary information down.
> I don't think that N series/arch-specific watchers is much worse than N
> identical environment watchers...

Actually there could be a difference. With N identical watchers,
we can potentially use a multiwatcher.Watcher which is designed to
have very low per-client overhead.

One other thing occurs to me too: currently when we watch
something in the state, we're watching something in the
underlying mongo. This watcher would not strictly be doing that
because the tools are not currently stored there. So we'd need to
spend extra code to poll the tools bucket (or watch the tools collection
when tools *are* stored in mongo) to make sure the watcher
results are accurate. Instead of that, just having the agents
make an API call to fetch the tools would be sufficient, amounts
to the same thing in the end, and it's simpler.

>> If the version changes, an agent *will* respond to it, apart from
>> in the rare case that the tools are not available for that agent. That's a case
>> not a case we need to be too concerned about IMO.
>
> I'm more concerned about every agent responding by making an API
> roundtrip to gather loads of data it doesn't need. The thundering herds
> of Next()s will be dwarfed by those, won't they?

A list of tools is hardly likely to be loads of data (10K gives us a
lot of tools!)

>> I also feel that from an architectural point of view, it makes sense
>> to have each agent responsible for choosing what tools it will run.
>> I'm not sure the centralised logic is a big help here, and
>> may make some things harder in the future - for example
>> in a cross-provider juju we may want different agents to fetch
>> from different tools repos.
>
> I'm not sure the distributed answer solves it any better or cheaper
> either.

It means that agents can potentially go somewhere else entirely
to fetch their tools and that might turn out to be a useful freedom.

Anyway, in the end, I think there's not that much to choose between
the two approaches. My inclination would be to go with the approach
that results in less new artifacts, but that's just my usual minimalist
bias :-)

>> > Yes; but to do that we'd need to know the arches of all the machines in
>> > play; and if we had all that information accessible to the API in the
>> > first place I'm not sure why we'd want to spend roundtrips on getting
>> > redundant data, rather than just sending the right stuff in the first
>> > place.
>>
>> I'm not sure I understand what you're saying here.
>
> Arch information is valuable for other use cases, even if they're not
> top priority. Arch information per machine also helps us to implement a
> tools watcher that's (1) accurate and (2) causes a smaller traffic spike
> on the server. I'm not sure the benefit of preserving a fragment of the
> original logic in-place is overwhelming.
>
> (I just wish we could update all the tools on a machine at the same
> time, to avoid watching (and, probably, concurrently downloading) N
> identical copies of the tools. Same problem as major-version upgrades,
> in a way, just writ small.)

This seems like premature optimisation to me. Upgrade events will
surely be very rare in the usual course of things and the tools will
usually be available on the same network, with bandwidth usually fast and free.
Having the agents upgrading independently means we don't
have to interlock between independent agents, which simplifies
quite a bit.

>> > Hmm, I think that's still potentially racy. If *agents* cut off their
>> > own API access, though... yeah, we can probably make it work. Nice.
>>
>> It's only racy if there are still some agents around that are directly
>> accessing mongo AFAICS. If by "cutting off their own API access"
>> you mean "stop talking to mongo directly", then I'm with you.
>
> Well, I mean "starting up talking to the API only, and explicitly
> revoking their own state access by making some API call". But then I'm
> not sure how we'd downgrade that safely... compatibility is a tighter
> constraint than it seems at first glance, and a "make things insecure"
> API is I think somewhat questionable.

Downgrading is indeed something you could not do if we didn't
want to implement that call. If we're prepared to live with that, I still think
we could do this without needing to implement full major-version upgrades