[Bug 2091947] Re: [SRU] Watcher crashes on creation of multiple audits and gets stuck in PENDING

Bryan Fraschetti 2091947 at bugs.launchpad.net
Tue Jan 7 18:07:19 UTC 2025


I've deleted the debdiffs because there has since been movement upstream
regarding these two patches and it may affect which releases will need
SRUs and the patches they need.
https://review.opendev.org/c/openstack/watcher/+/938434 and
https://review.opendev.org/c/openstack/watcher/+/938437

I'll reupload the debdiffs once it becomes clear

** Description changed:

  A customer is facing an issue where the watcher-decision-engine service
  crashes when creating an audit plan with the Audit type set to
  CONTINUOUS. Below are the steps to reproduce the issue:
  
  Environment Details:
  1. Deploy Openstack Yoga on Jammy with Watcher and Gnocchi as watcher's storage backend
  
  2. Create an audit
  openstack optimize audit create --name workload_stabilization_test_1 -s workload_stabilization -g workload_balancing --audit_type CONTINUOUS --interval 60 --auto-trigger
  
  3. Check the audit state
  openstack optimize audit list
  Observe it says "CONTINUOUS ONGOING"
  
  4. Create a second audit
  openstack optimize audit create --name workload_stabilization_test_2 -s workload_stabilization -g workload_balancing --audit_type CONTINUOUS --interval 60 --auto-trigger
  
  5. Check the audit state
  openstack optimize audit list
  Observe the second audit is stuck in "CONTINUOUS PENDING"
  
  6. Check watcher's status and observe that it crashed with the following traceback
  systemctl status watcher-decision-engine.service
  
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     self.run()
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:   File "/usr/lib/python3.10/threading.py", line 953, in run
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     self._target(*self._args, **self._kwargs)
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:   File "/usr/lib/python3/dist-packages/apscheduler/schedulers/blocking.py", line 32, in _main_loop
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     wait_seconds = self._process_jobs()
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:   File "/usr/lib/python3/dist-packages/apscheduler/schedulers/base.py", line 1006, in _process_jobs
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     jobstore_next_run_time = jobstore.get_next_run_time()
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:   File "/usr/lib/python3/dist-packages/apscheduler/jobstores/sqlalchemy.py", line 84, in get_next_run_time
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     return utc_timestamp_to_datetime(float(next_run_time))
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]: TypeError: float() argument must be a string or a real number, not 'NoneType'
  
  This was fixed upstream in 2024.2 at
  https://opendev.org/openstack/watcher/commit/d6f169197efc5b4f6c8a2e6bc38177b0641ca05c
  which properly addresses the type conversion and
  https://opendev.org/openstack/watcher/commit/fbb290b2238e9e72054892e9ae6108a8907f47d7
- which adjusts the unit tests to accommodate this fix.
+ which adjusts the unit tests to support croniter 5.0.0+, which is the
+ default installed by tox on Noble and Oracular since they are shipped
+ with Python3.12.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2091947

Title:
  [SRU] Watcher crashes on creation of multiple audits and gets stuck in
  PENDING

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive antelope series:
  New
Status in Ubuntu Cloud Archive bobcat series:
  New
Status in Ubuntu Cloud Archive caracal series:
  New
Status in Ubuntu Cloud Archive dalmation series:
  Fix Released
Status in Ubuntu Cloud Archive epoxy series:
  Fix Released
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  New
Status in watcher package in Ubuntu:
  Fix Released
Status in watcher source package in Focal:
  Confirmed
Status in watcher source package in Jammy:
  Confirmed
Status in watcher source package in Noble:
  Confirmed
Status in watcher source package in Oracular:
  Fix Released
Status in watcher source package in Plucky:
  Fix Released

Bug description:
  A customer is facing an issue where the watcher-decision-engine
  service crashes when creating an audit plan with the Audit type set to
  CONTINUOUS. Below are the steps to reproduce the issue:

  Environment Details:
  1. Deploy Openstack Yoga on Jammy with Watcher and Gnocchi as watcher's storage backend

  2. Create an audit
  openstack optimize audit create --name workload_stabilization_test_1 -s workload_stabilization -g workload_balancing --audit_type CONTINUOUS --interval 60 --auto-trigger

  3. Check the audit state
  openstack optimize audit list
  Observe it says "CONTINUOUS ONGOING"

  4. Create a second audit
  openstack optimize audit create --name workload_stabilization_test_2 -s workload_stabilization -g workload_balancing --audit_type CONTINUOUS --interval 60 --auto-trigger

  5. Check the audit state
  openstack optimize audit list
  Observe the second audit is stuck in "CONTINUOUS PENDING"

  6. Check watcher's status and observe that it crashed with the following traceback
  systemctl status watcher-decision-engine.service

  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     self.run()
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:   File "/usr/lib/python3.10/threading.py", line 953, in run
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     self._target(*self._args, **self._kwargs)
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:   File "/usr/lib/python3/dist-packages/apscheduler/schedulers/blocking.py", line 32, in _main_loop
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     wait_seconds = self._process_jobs()
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:   File "/usr/lib/python3/dist-packages/apscheduler/schedulers/base.py", line 1006, in _process_jobs
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     jobstore_next_run_time = jobstore.get_next_run_time()
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:   File "/usr/lib/python3/dist-packages/apscheduler/jobstores/sqlalchemy.py", line 84, in get_next_run_time
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]:     return utc_timestamp_to_datetime(float(next_run_time))
  Nov 27 19:53:54 juju-2752e1-86-lxd-27 watcher-decision-engine[965896]: TypeError: float() argument must be a string or a real number, not 'NoneType'

  This was fixed upstream in 2024.2 at
  https://opendev.org/openstack/watcher/commit/d6f169197efc5b4f6c8a2e6bc38177b0641ca05c
  which properly addresses the type conversion and
  https://opendev.org/openstack/watcher/commit/fbb290b2238e9e72054892e9ae6108a8907f47d7
  which adjusts the unit tests to support croniter 5.0.0+, which is the
  default installed by tox on Noble and Oracular since they are shipped
  with Python3.12.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2091947/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list