ssh authorized_keys and known_hosts

Tue Oct 18 18:56:37 UTC 2011

Excerpts from William Reade's message of Tue Oct 18 02:26:57 -0700 2011:
> Hi all
> 
> I've been looking at https://bugs.launchpad.net/bugs/802117 (ensemble
> ssh command should use a different known_hosts file), and I have a
> (multipart) suggestion (which should maybe be broken into separate bugs,
> assuming anyone actually likes the ideas herein):
> 
> 1) Juju should generate and store an environment-specific SSH key when
> it bootstraps, and should always authorise that key, and use it whenever
> it connects (either when tunnelling to ZK, or when just 'juju ssh'ing).
> 

Its likely that *many* users will make use of a single juju
environment. So generating a key at bootstrap time is a nice trick,
but may cause problems for multiple users. Since only the bootstrapping
user will have the generated key, other users won't be able to connect.

I'd rather see this addressed in this context:

https://bugs.launchpad.net/juju/+bug/834930

Admin users need to be able to add/remove keys to the environment. If
we want to help the user out by telling them they don't have an SSH key,
thats fine, but as Scott Moser said, having passwordless/agentless keys
that are only useful in juju doesn't really improve usability enough to
warrant the risk.

> 2) (most relevant to this bug as stated) machine agents should publish
> their machine's public key to ZK, and any juju-mediated SSHing should
> use: the generated IdentityFile; a temporary UserKnownHostsFile,
> generated on demand, containing just the required host; and
> StrictHostKeyChecking=yes.
> 

This makes zookeeper an attack vector for man in the middle attacks, as
anybody who can write keys in will be able to MITM any ssh connection.
Since we don't have fine grained access control yet anyway, this is sort
of moot, as anybody who has access to ZK can also just inject a charm
that roots any box.

If we do get fine grained access control, I'd suggest doing Dustin's
method so that only the provisioning agent can write these keys to ZK,
and then at least one can encapsulate the compromise in the provisioning
server and take steps to harden it. This is needed anyway since it
also is privy to the AWS credentials.

> 3) Now, the solution described so far won't work for the bootstrap node
> (you can't get anything out of ZK without an SSH key in the first
> place). So, the *provisioning* agent should publish each machine's
> public key to the provider file storage as soon as it becomes available
> (and delete it when the machine goes away), and anyone wanting to
> connect to a machine should get the public key from file storage rather
> than zookeeper.
> 

This now also makes the file storage a vector for MITM compromise. While
in theory, only the provisioning agent should have access to write to
S3/webdav/etc., anybody who was accidentally granted this write access
would be able to insert their own keys and MITM all SSH connections.

A lot of this is mitigated if the client only ever talks directly to
the bootstrap node(s).

The reason typing yes is so worthless isn't that the method is flawed,
its that we're asked 100 times a day because we spin up 100 instances a
day. Grouping security around a small group of machines is going to be
more productive because it will be less likely to be ignored. Of course,
it also means that those machines will need to be as hardened as possible
and put under much greater scrutiny.

So, in summary, I like the general idea, and would summarize it down to this:

* Help users manage keys to make the process more fluid, don't generate
  keys.
* Help users avoid a man-in-the-middle attack by having clients optionally
  subscribe to known_hosts from provisioning agent.

This syncs up quite nicely actually with Juan's recent submission of
a capistrano status renderer. If we calso have something like 'juju
sync-host-keys .ssh/known_hosts' that automatically edits known_hosts
to contain the keys in the environment, and maybe even remove stale keys
from the environment, that would be, I think, a nice user experience.

> Does anyone have any objections to this, before I get too deeply into
> this? Tentatively, it feels like it could be broken down into the
> following bugs:
> 
> * machines' public keys should be published to filestorage by the
> provisioning agent; and juju should specify the host's public key,
> acquired therefrom, before attempting to connect via ssh.
> * juju should generate (and authorise) its own ssh key at bootstrap
> time, and always use that to connect to machines.
> * add "juju access" subcommand to manage authorised keys, and necessary
> infrastructure to keep them up to date on machines.
> 

I hate to be a debbie downer, but there are a whole bunch of other bugs
that are marked High that need urgent attention. I filed bug 802117,
and do still feel that it needs addressing. However, HA for ZK and
provisioning agent, and being able to reboot servers managed by juju
seem a lot more important.

We basically have until around January to complete features that need
to be in Ubuntu 12.04. There will be no last minute upload this time.
I would totally love to see a novel solution for this included, but I
don't think it will be relevant at all if people can't run real workloads
on juju.