ssh authorized_keys and known_hosts

William Reade william.reade at canonical.com
Wed Oct 19 09:00:01 UTC 2011


On Tue, 2011-10-18 at 11:56 -0700, Clint Byrum wrote:
> Its likely that *many* users will make use of a single juju
> environment. So generating a key at bootstrap time is a nice trick,
> but may cause problems for multiple users. Since only the bootstrapping
> user will have the generated key, other users won't be able to connect.

[aside: this is no worse than the existing situation, surely?]

> I'd rather see this addressed in this context:
> 
> https://bugs.launchpad.net/juju/+bug/834930
> 
> Admin users need to be able to add/remove keys to the environment. If
> we want to help the user out by telling them they don't have an SSH key,
> thats fine, but as Scott Moser said, having passwordless/agentless keys
> that are only useful in juju doesn't really improve usability enough to
> warrant the risk.

I think the arguments here are distinct. I absolutely agree that adding
and removing keys is a necessary feature, but I'm not sure that's an
argument against generating a key at bootstrap time (if only when an
existing key cannot be found).

We have had a few users confused by the SSH key requirement, and I tend
to assume that those who seek help are outnumbered by those who end up
thinking "meh, doesn't work" and moving on; so long as it's easy to
create a more security-conscious environment, I think it's in our
interest to allow people to explore the possibilities without
encountering unexpected hurdles.

Consider HA, for example: it's absolutely a necessary story, but I don't
think it's one we should enable by default (say we got it down to a
single config item, dead-zookeeper-tolerance: 0 should be the default,
because it's a valid value for certain applications; it's cheaper than
always bringing up 5 zookeeper nodes, and it's also easy to fix if you
need it).

> 
> > 2) (most relevant to this bug as stated) machine agents should publish
> > their machine's public key to ZK, and any juju-mediated SSHing should
> > use: the generated IdentityFile; a temporary UserKnownHostsFile,
> > generated on demand, containing just the required host; and
> > StrictHostKeyChecking=yes.
> > 
> 
> This makes zookeeper an attack vector for man in the middle attacks, as
> anybody who can write keys in will be able to MITM any ssh connection.
> Since we don't have fine grained access control yet anyway, this is sort
> of moot, as anybody who has access to ZK can also just inject a charm
> that roots any box.

This was indeed written on the assumption that fine-grained access
control would be available in the nearish future.

> If we do get fine grained access control, I'd suggest doing Dustin's
> method so that only the provisioning agent can write these keys to ZK,
> and then at least one can encapsulate the compromise in the provisioning
> server and take steps to harden it. This is needed anyway since it
> also is privy to the AWS credentials.

Sounds sensible; if we use Dustin's method we don't need the machine
agents to be involved at all, and it should be much easier to think
about.

> > 3) Now, the solution described so far won't work for the bootstrap node
> > (you can't get anything out of ZK without an SSH key in the first
> > place). So, the *provisioning* agent should publish each machine's
> > public key to the provider file storage as soon as it becomes available
> > (and delete it when the machine goes away), and anyone wanting to
> > connect to a machine should get the public key from file storage rather
> > than zookeeper.
> > 
> 
> This now also makes the file storage a vector for MITM compromise. While
> in theory, only the provisioning agent should have access to write to
> S3/webdav/etc., anybody who was accidentally granted this write access
> would be able to insert their own keys and MITM all SSH connections.
> 
> A lot of this is mitigated if the client only ever talks directly to
> the bootstrap node(s).

I think this is inescapable. The file storage is used to tell the client
where to find the bootstrap node in the first place; I'm pretty sure you
could still inject malicious charms with FS access alone. So, I don't
think that moving the public keys one step further actually weakens
security; it means that, in this narrow context, we need to care about
the security of an extra component, but it's a component that needs to
be secure *anyway*, so we're actually no worse off.

> 
> The reason typing yes is so worthless isn't that the method is flawed,
> its that we're asked 100 times a day because we spin up 100 instances a
> day. Grouping security around a small group of machines is going to be
> more productive because it will be less likely to be ignored. Of course,
> it also means that those machines will need to be as hardened as possible
> and put under much greater scrutiny.

Agreed. I think we need all of the following:

* machines/agents whose identities we can actually verify
* fine-grained access control
* secure file storage

...and we need all of those for just about any "security" story I can
imagine. Given all the above, it becomes almost trivial to implement a
don't-type-yes story.

> 
> So, in summary, I like the general idea, and would summarize it down to this:
> 
> * Help users manage keys to make the process more fluid, don't generate
>   keys.
> * Help users avoid a man-in-the-middle attack by having clients optionally
>   subscribe to known_hosts from provisioning agent.
> 
> This syncs up quite nicely actually with Juan's recent submission of
> a capistrano status renderer. If we calso have something like 'juju
> sync-host-keys .ssh/known_hosts' that automatically edits known_hosts
> to contain the keys in the environment, and maybe even remove stale keys
> from the environment, that would be, I think, a nice user experience.
> 
> > Does anyone have any objections to this, before I get too deeply into
> > this? Tentatively, it feels like it could be broken down into the
> > following bugs:
> > 
> > * machines' public keys should be published to filestorage by the
> > provisioning agent; and juju should specify the host's public key,
> > acquired therefrom, before attempting to connect via ssh.
> > * juju should generate (and authorise) its own ssh key at bootstrap
> > time, and always use that to connect to machines.
> > * add "juju access" subcommand to manage authorised keys, and necessary
> > infrastructure to keep them up to date on machines.
> > 
> 
> I hate to be a debbie downer, but there are a whole bunch of other bugs
> that are marked High that need urgent attention. I filed bug 802117,
> and do still feel that it needs addressing. However, HA for ZK and
> provisioning agent, and being able to reboot servers managed by juju
> seem a lot more important.

Fully appreciated; and granted, up to a point: I do agree that we can
live with typing "yes" 100 times a day, but I'd also say that this has
become more of a discussion about security in general, and where we're
lacking it, and that *that* is every bit as important as the other
things you mention.

> We basically have until around January to complete features that need
> to be in Ubuntu 12.04. There will be no last minute upload this time.
> I would totally love to see a novel solution for this included, but I
> don't think it will be relevant at all if people can't run real workloads
> on juju.

Eep, January. Thanks for making me aware of that :-).




More information about the Juju mailing list