ssh authorized_keys and known_hosts

Tue Oct 18 09:26:57 UTC 2011

Hi all

I've been looking at https://bugs.launchpad.net/bugs/802117 (ensemble
ssh command should use a different known_hosts file), and I have a
(multipart) suggestion (which should maybe be broken into separate bugs,
assuming anyone actually likes the ideas herein):

1) Juju should generate and store an environment-specific SSH key when
it bootstraps, and should always authorise that key, and use it whenever
it connects (either when tunnelling to ZK, or when just 'juju ssh'ing).

2) (most relevant to this bug as stated) machine agents should publish
their machine's public key to ZK, and any juju-mediated SSHing should
use: the generated IdentityFile; a temporary UserKnownHostsFile,
generated on demand, containing just the required host; and
StrictHostKeyChecking=yes.

3) Now, the solution described so far won't work for the bootstrap node
(you can't get anything out of ZK without an SSH key in the first
place). So, the *provisioning* agent should publish each machine's
public key to the provider file storage as soon as it becomes available
(and delete it when the machine goes away), and anyone wanting to
connect to a machine should get the public key from file storage rather
than zookeeper.

Benefits:

* Nobody has to type "yes".

* Nobody hits the "no SSH keys" bootstrap error we occasionally hear of,
and nobody gets put off by having to generate an SSH key for themselves
before they can try juju. (OK, it's not a lot of work, but I suspect
every extra step before you see something working will lose us a certain
proportion of potential users.)

* The initial "juju status" commands (when waiting for bootstrap) can be
much friendlier: we check for the node's public key in FileStorage, and
if it's not there we can *quickly* say "Environment not ready: waiting
for public key from machine/0"; however, once it *is* there, we should
get quick connections (no waiting for initialisation, because ZK must
already be up and running for the key to have got from MA to PA before
being written to FS).

* more..?

Drawbacks:

* No mention is made of what we should do if we want to connect with
additional keys (but really, why would we want to if we can already
'juju ssh' to any machine?). This is a pre-existing problem, anyway, and
we'll need to do some work on it regardless (if it is an important use
case).

* Sharing the juju admin key between different clients will be a hassle,
but no more so than sharing environment config already is. (I guess this
is maybe a reason to authorise multiple keys; but again, this relates to
the existing problem, and we probably want something like "juju access
grant|revoke /some/random/id_rsa.pub" command[0].

* more..?

Does anyone have any objections to this, before I get too deeply into
this? Tentatively, it feels like it could be broken down into the
following bugs:

* machines' public keys should be published to filestorage by the
provisioning agent; and juju should specify the host's public key,
acquired therefrom, before attempting to connect via ssh.
* juju should generate (and authorise) its own ssh key at bootstrap
time, and always use that to connect to machines.
* add "juju access" subcommand to manage authorised keys, and necessary
infrastructure to keep them up to date on machines.

...each of which is reasonably independent, and gives some sort of value
on its own. Thoughts?

Cheers
William

[0] I suppose "juju access revoke ~/.juju/myenv/id_rsa.pub" could be a
problem... details, details.