[Bug 1938299] Re: Unable to SSH Into Instance when deploying Impish 21.10
Brian Murray
1938299 at bugs.launchpad.net
Mon Jul 18 22:58:26 UTC 2022
Ubuntu 21.10 (Impish Indri) has reached end of life, so this bug will
not be fixed for that specific release.
** Changed in: google-guest-agent (Ubuntu Impish)
Status: Confirmed => Won't Fix
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to google-guest-agent in Ubuntu.
Matching subscriptions: foundations-bugs
https://bugs.launchpad.net/bugs/1938299
Title:
Unable to SSH Into Instance when deploying Impish 21.10
Status in cloud-init package in Ubuntu:
Fix Released
Status in google-guest-agent package in Ubuntu:
Fix Released
Status in netplan.io package in Ubuntu:
Fix Released
Status in cloud-init source package in Bionic:
Fix Released
Status in google-guest-agent source package in Bionic:
Confirmed
Status in netplan.io source package in Bionic:
Won't Fix
Status in cloud-init source package in Focal:
Fix Released
Status in google-guest-agent source package in Focal:
Confirmed
Status in netplan.io source package in Focal:
Won't Fix
Status in cloud-init source package in Hirsute:
Fix Released
Status in google-guest-agent source package in Hirsute:
Won't Fix
Status in netplan.io source package in Hirsute:
Won't Fix
Status in cloud-init source package in Impish:
Fix Released
Status in google-guest-agent source package in Impish:
Won't Fix
Status in netplan.io source package in Impish:
Won't Fix
Bug description:
=== Begin SRU Template ===
[Impact]
In PR #919 (81299de), we refactored some of the code used to bring up networks across distros. Previously, the call to bring up network interfaces during 'init' stage unintentionally resulted in a no-op such that network interfaces were NEVER brought up by cloud-init, even if new network interfaces were found after crawling the metadata.
In #919, the code was altered to bring up these discovered network
interfaces. On Ubuntu, this results in a 'netplan apply' call during
'init' stage for any ubuntu-based distro on a datasource that has a
NETWORK dependency. On GCE, this additional 'netplan apply' conflicts
with the google-guest-agent service, shutting that service down due to
that project's PartOf= systemd relationship resulting in an instance
that can not be connected to.
To fix this, we added a new 'disable_network_activation' option that
can be set to true in /etc/cloud.cfg.d/*.cfg by image creators to
disable the activation of network interfaces in 'init' stage. This
will avoid the 'netplan apply' call on GCE instances.
[Test Case]
An integration test has been added at `tests/integration_tests/datasources/test_network_dependency.py` to test this functionality. To test manually:
1. Launch an instance on GCE
2. Install the cloud-init version with the fix
3. Add a file, '/etc/cloud/cloud.cfg.d/99-disable-network-activation.cfg' with the contents:
disable_network_activation: true
4. Run cloud-init clean --logs
5. Create a new image based on this instance
6. Launch a new instance based on the new image
7. Instance should launch successfully and able to be ssh'ed into
8. "['netplan', 'apply']" should not be present anywhere in /var/log/cloud-init.log.
9. "Bringing up newly configured network interfaces" should not exist anywhere in /var/log/cloud-init.log
In the failure case, we will fail at step 7.
[Regression Potential]
The code in question determines whether to bring up interfaces after applying network config. Accidentally not doing this should not be a problem as we previously (unintentionally) did not bring these interfaces up. Accidentally bringing up interfaces when we shouldn't be also generally shouldn't cause a large problem outside of GCE, because outside of GCE there aren't (that we're aware of) other processes independently setting up network. If this setup determination code somehow fails, it happens early enough in boot that it could leave an instance unusable, however, the code is small enough and defensive enough that we don't believe that is a possibility.
Additionally, any cloud datasource that is discovered in `init-local`
stage (Azure, Ec2, Hetzner, IBMCloud, OpenStack and Oracle) aren't
exposed to this code path because full network config it emitted
before system network is brought up so there is no need to call
`netplan apply` at that time.
[Other Info]
Github PR: https://github.com/canonical/cloud-init/pull/1048
Upstream commit: https://github.com/canonical/cloud-init/commit/9c147e8341e287366790e60658f646cdcc59bef2
=== End SRU Template ===
Original bug report:
Google Instances deployed with the Ubuntu 21.10 Daily images are
inaccessible via SSH.
gcloud compute instances create sf-impish-v20200720 --zone us-west1-a
--network "default" --no-restart-on-failure --image-project ubuntu-os-
cloud-devel --image daily-ubuntu-2110-impish-v20210720 --machine-type
n1-standard-2
Will result in a successful deploy yet, inaccessible via ssh from the
end users configured laptop.
This appears to affect all daily images after 20210719.
daily-ubuntu-2110-impish-v20210719 ubuntu-os-cloud-devel ubuntu-2110 READY
daily-ubuntu-2110-impish-v20210720 ubuntu-os-cloud-devel ubuntu-2110 READY
daily-ubuntu-2110-impish-v20210721 ubuntu-os-cloud-devel ubuntu-2110 READY
daily-ubuntu-2110-impish-v20210723 ubuntu-os-cloud-devel ubuntu-2110 READY
daily-ubuntu-2110-impish-v20210724 ubuntu-os-cloud-devel ubuntu-2110 READY
daily-ubuntu-2110-impish-v20210725 ubuntu-os-cloud-devel ubuntu-2110 READY
daily-ubuntu-2110-impish-v20210728 ubuntu-os-cloud-devel ubuntu-2110
This problem also appears to be reproducible via the gcloud UI, create
a new virtual machine using the daily-ubuntu-2110-impish-v20210720 or
greater and instruct the virtual machine to import a ssh_pub_key in
the security tab. The Instance will start, yet still be inaccessible
via the users private sshkey
The google-guest-agent.service appears to be responsible for adding
the google project ssh keys to the instance once its deployed. Please
see below when queried on the 20210719 image:
google-guest-agent.service - Google Compute Engine Guest Agent
Loaded: loaded (/lib/systemd/system/google-guest-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-07-27 19:47:48 UTC; 18h ago
Main PID: 711 (google_guest_ag)
Tasks: 9 (limit: 8924)
Memory: 19.7M
CGroup: /system.slice/google-guest-agent.service
└─711 /usr/bin/google_guest_agent
Jul 27 19:47:55 sean-imp gpasswd[1469]: user google added by root to group floppy
Jul 27 19:47:55 sean-imp gpasswd[1475]: user google added by root to group audio
Jul 27 19:47:55 sean-imp gpasswd[1481]: user google added by root to group dip
Jul 27 19:47:55 sean-imp gpasswd[1487]: user google added by root to group video
Jul 27 19:47:55 sean-imp gpasswd[1493]: user google added by root to group plugdev
Jul 27 19:47:55 sean-imp gpasswd[1499]: user google added by root to group netdev
Jul 27 19:47:55 sean-imp gpasswd[1505]: user google added by root to group lxd
Jul 27 19:47:55 sean-imp gpasswd[1511]: user google added by root to group google-sudoers
Jul 27 19:47:55 sean-imp GCEGuestAgent[711]: 2021-07-27T19:47:55.1699Z GCEGuestAgent Info: Updating keys for user google.
Jul 27 19:47:55 sean-imp google_guest_agent[711]: 2021/07/27 19:47:55 logging client: rpc error: code = PermissionDenied desc = Clo>
lines 1-19/19 (END)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1938299/+subscriptions
More information about the foundations-bugs
mailing list