[Bug 1987663] Re: cinder-volume: "Failed to re-export volume, setting to ERROR" with "tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected" on service startup
Mauricio Faria de Oliveira
1987663 at bugs.launchpad.net
Wed Oct 18 17:54:29 UTC 2023
** Description changed:
- Debian bug #1018084 [1]
- Debian MR on Salsa: [2]
+ [Impact]
- [1] https://bugs.debian.org/1018084
- [2] https://salsa.debian.org/openstack-team/services/cinder/-/merge_requests/2
+ * The cinder-volume service might fail to re-export volumes
+ in-use on startup if tgt.service isn't fully started yet.
+
+ * This affects the 'lvm' driver with 'tgtadm' target helper
+ (which runs 'tgtadm' commands that need the service ready).
+
+ * Snippets from /var/log/cinder/cinder-volume.log:
+
+ Failed to re-export volume, setting to ERROR.
+ ...
+ Command: tgtadm --lld iscsi --op show --mode target
+ ...
+ Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
+ Transport endpoint is not connected\n'
- Problem:
+ * This issue is more common in openstack compute nodes
+ with networking (ovs/ovn) that takes long to startup,
+ which might delay the startup of tgt.service _after_
+ cinder-volume.service.
- The cinder-volume.service unit is _not_ ordered after the tgt.service unit,
- and thus might fail to run 'tgtadm' commands because tgtd is not yet ready,
- on service start up / boot:
+ [Test Steps]
- INFO cinder.service [-] Starting cinder-volume node (version 12.0.9)
- ...
- ERROR cinder.volume.manager [req-UUID - - - - -] Failed to re-export volume, setting to ERROR.: ProcessExecutionError: Unexpected error while running command.
- Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
- Exit code: 107
- Stdout: u''
- Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
+ * Steps to reproduce are detailed in comment #2.
+ Summary:
+
+ * Install mysql, rabbitmq-server, keystone, and cinder
+ (controller and storage nodes; backup node unneeded).
- Approach:
+ * Configure cinder-volume (storage node) for LVM backend
+ and tgtadm iSCSI helper (tgt.service).
+
+ * Create a cinder volume, and configure it as 'in-use'.
- It should be OK to order `cinder-volume.service` (generated from `cinder-volume.init.in` template) with `After=tgt.service`,
- as the `cinder-volume` package has a `Depends: tgt` dependency, and actually restarts `tgt` on its install/remove scripts,
- thus it is already strongly bound to `tgt` at the packaging level, even if its runtime usage is optional (eg, other drivers).
+ * Simulate a start delay on tgt.service with a drop-in.
- # dpkg -s cinder-volume | grep Depends:
- Depends: ..., tgt, ...
+ * Restart services: cinder-volume.service tgt.service
+
+ * Check sequence of service startup.
+
+ * Check status of the cinder volume:
+ 'in-use' (expected) or 'error' (bug).
+
+ * Check log file /var/log/cinder/cinder-volume.log for
+ 'tgtadm: failed to send request hdr to tgt daemon'.
+
+ [Regression Potential]
+
+ * The fix introduces systemd unit 'After=' and 'Wants='
+ properties for tgt.service in cinder-volume.service,
+ thus might delay the boot process (multi-user.target).
+
+ $ systemctl show cinder-volume.service | grep WantedBy=
+ WantedBy=multi-user.target
+
+ * However, the boot process already waits on tgt.service
+ anyway, thus the difference (if any) should not be big,
+ and would provide more correct behavior.
+
+ $ systemctl show tgt.service | grep WantedBy=
+ WantedBy=multi-user.target
+
+ * If tgt.service is not present (tgt package not installed)
+ _no errors_ occur, as both 'After=' and 'Wants=' are weak
+ ordering/dependency properties (man 5 systemd.unit).
+
+ [Other Info]
+
+ * The fix uses a systemd service drop-in snippet because
+ the service unit is generated by openstack-pkg-tools
+ (pkgos-gen-systemd-unit) based on the 'init' service,
+ and it only emits 'Wants=' for network-online.target.
+
+ * Changing that in openstack-pkg-tools changes behavior
+ in stable releases, and only manifest at build time,
+ for many openstack packages that have no issues now.
+
+ * We'll continue to pursue the general improvement in
+ Debian, so it comes into Ubuntu development release,
+ but for the Ubuntu stable releases, this should do.
+
+ [Original Bug Description]
Real-world example:
1) Unit `ovs-vswitchd.service` took 2 minutes to start up.
2) That delayed `network.target` (`ovs-vswitchd.service` has `Before=network.target`).
3) That delayed `tgt.service` too (it has `After=network.target`).
4) BUT that did _not_ delay `cinder-volume.service` (it does _not_ have `After=tgt.service`)
5) So it failed to talk to tgtd with tgtadm, and volume re-export failed.
$ cat sos_commands/logs/journalctl_--no-pager_--catalog_--boot \
| grep -E -B1 'Subject: Unit (cinder-volume.service|tgt.service|ovs-vswitchd.service|network.target)'
Aug 10 06:23:15 <HOST> systemd[1]: Started OpenStack Cinder Volume.
-- Subject: Unit cinder-volume.service has finished start-up
--
<<< ERROR in cinder-volume.log >>>
Aug 10 06:23:16 <HOST> systemd[1]: Starting Open vSwitch Forwarding Unit...
-- Subject: Unit ovs-vswitchd.service has begun start-up
--
<<< DELAY of 2 minutes >>>
Aug 10 06:25:17 <HOST> systemd[1]: Started Open vSwitch Forwarding Unit.
-- Subject: Unit ovs-vswitchd.service has finished start-up
--
Aug 10 06:25:17 <HOST> systemd[1]: Reached target Network.
-- Subject: Unit network.target has finished start-up
--
Aug 10 06:25:17 <HOST> systemd[1]: Starting (i)SCSI target daemon...
-- Subject: Unit tgt.service has begun start-up
--
Aug 10 06:25:19 <HOST> systemd[1]: Started (i)SCSI target daemon.
-- Subject: Unit tgt.service has finished start-up
<<< NOW tgtd is running >>>
@ var/log/cinder/cinder-volume.log
9901 2022-08-10 06:23:21.515 3939 INFO cinder.service [-] Starting cinder-volume node (version 12.0.9)
...
9932 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager [req-e67a852a-6cce-4073-a2de-d3487c85d586 - - - - -] Failed to re-export volume, setting to ERROR.: ProcessExecutionError: Unexpected error while running command.
9933 Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
9934 Exit code: 107
9935 Stdout: u''
9936 Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
9937 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Traceback (most recent call last):
9938 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 486, in init_host
9939 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.driver.ensure_export(ctxt, volume)
9940 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 826, in ensure_export
9941 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager self.target_driver.ensure_export(context, volume, volume_path)
9942 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/iscsi.py", line 261, in ensure_export
9943 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager old_name=None, **portals_config)
9944 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 818, in _wrapper
9945 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return r.call(f, *args, **kwargs)
9946 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 206, in call
9947 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return attempt.get(self._wrap_exception)
9948 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 247, in get
9949 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager six.reraise(self.value[0], self.value[1], self.value[2])
9950 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/retrying.py", line 200, in call
9951 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
9952 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/targets/tgt.py", line 141, in create_iscsi_target
9953 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager run_as_root=True)
9954 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 126, in execute
9955 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager return processutils.execute(*cmd, **kwargs)
9956 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 424, in execute
9957 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager cmd=sanitized_cmd)
9958 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
9959 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op show --mode target
9960 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Exit code: 107
9961 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stdout: u''
9962 2022-08-10 06:23:23.398 3932 ERROR cinder.volume.manager Stderr: u'tgtadm: failed to send request hdr to tgt daemon, Transport endpoint is not connected\n'
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to cinder in Ubuntu.
https://bugs.launchpad.net/bugs/1987663
Title:
cinder-volume: "Failed to re-export volume, setting to ERROR" with
"tgtadm: failed to send request hdr to tgt daemon, Transport endpoint
is not connected" on service startup
Status in cinder package in Ubuntu:
In Progress
Status in cinder package in Debian:
Fix Released
Bug description:
[Impact]
* The cinder-volume service might fail to re-export volumes
in-use on startup if tgt.service isn't fully started yet.
* This affects the 'lvm' driver with 'tgtadm' target helper
(which runs 'tgtadm' commands that need the service ready).
* Snippets from /var/log/cinder/cinder-volume.log:
Failed to re-export volume, setting to ERROR.
...
Command: tgtadm --lld iscsi --op show --mode target
...
Stderr: 'tgtadm: failed to send request hdr to tgt daemon,
Transport endpoint is not connected\n'
* This issue is more common in openstack compute nodes
with networking (ovs/ovn) that takes long to startup,
which might delay the startup of tgt.service _after_
cinder-volume.service.
[Test Steps]
* Steps to reproduce are detailed in comment #3.
Summary:
* Install mysql, rabbitmq-server, keystone, and cinder
(controller and storage nodes; backup node unneeded).
* Configure cinder-volume (storage node) for LVM backend
and tgtadm iSCSI helper (tgt.service).
* Create a cinder volume, and configure it as 'in-use'.
* Simulate a start delay on tgt.service with a drop-in.
* Restart services: cinder-volume.service tgt.service
* Check sequence of service startup.
* Check status of the cinder volume:
'in-use' (expected) or 'error' (bug).
* Check log file /var/log/cinder/cinder-volume.log for
'tgtadm: failed to send request hdr to tgt daemon'.
[Regression Potential]
* The fix introduces systemd unit 'After=' and 'Wants='
properties for tgt.service in cinder-volume.service,
thus might delay the boot process (multi-user.target).
$ systemctl show cinder-volume.service | grep WantedBy=
WantedBy=multi-user.target
* However, the boot process already waits on tgt.service
anyway, thus the difference (if any) should not be big,
and would provide more correct behavior.
$ systemctl show tgt.service | grep WantedBy=
WantedBy=multi-user.target
* If tgt.service is not present (tgt package not installed)
_no errors_ occur, as both 'After=' and 'Wants=' are weak
ordering/dependency properties (man 5 systemd.unit).
[Other Info]
* The fix uses a systemd service drop-in snippet because
the service unit is generated by openstack-pkg-tools
(pkgos-gen-systemd-unit) based on the 'init' service,
and it only emits 'Wants=' for network-online.target.
* Changing that in openstack-pkg-tools changes behavior
in stable releases, and only manifest at build time,
for many openstack packages that have no issues now.
* We'll continue to pursue the general improvement in
Debian, so it comes into Ubuntu development release,
but for the Ubuntu stable releases, this should do.
[Original Bug Description]
Real-world example in comment #2.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cinder/+bug/1987663/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list