[Bug 1930361] Fix merged to masakari-monitors (stable/wallaby)

OpenStack Infra 1930361 at bugs.launchpad.net
Tue Jul 27 07:17:40 UTC 2021


Reviewed:  https://review.opendev.org/c/openstack/masakari-monitors/+/802351
Committed: https://opendev.org/openstack/masakari-monitors/commit/9ae886e7428e61dfc6a29ec65b0f6836d2648326
Submitter: "Zuul (22348)"
Branch:    stable/wallaby

commit 9ae886e7428e61dfc6a29ec65b0f6836d2648326
Author: sue <sugar-2008 at 163.com>
Date:   Wed Jun 2 16:38:05 2021 +0800

    Fix hostmonitor hanging forever after certain exceptions
    
    The hostmonitor, like other Masakari monitors, starts as an
    Oslo service (based on eventlet). The main thread is supposed
    to run a loop that has an internal wait mechanism (instead of
    reusing periodic_tasks from oslo_service). However, the loop
    could be broken, if an unexpected exception appeared, and it
    never ran again but the process was still alive (due to
    oslo_service not stopping). The example mentioned in the bug
    report is about unavailability of the Masakari API (and/or
    Keystone API) before notification sending. This exception is
    not caught early because SendNotification._make_client is
    called outside of the try block (unlike the actual notification
    sending). The exception bubbles up and stops the main loop,
    leaving a useless hostmonitor process. The user is unaware
    unless they notice the logs are no longer growing.
    
    While the general design begs for a revamp (we might get away
    with that by using Consul in the first place), the easy fix is
    to prevent exceptions breaking the loop completely so that the
    hostmonitor can continue to work and try to regain health.
    At the very least it will keep posting ERROR messages in the log
    which is more likely to be spotted in comparison to lack of logs
    (which is, unfortunately, less commonly considered an alerting
    situation).
    
    This change also fixes, adapts and robustifies the two relevant
    unit tests.
    
    Closes-Bug: #1930361
    Co-Authored-By: Radosław Piliszek <radoslaw.piliszek at gmail.com>
    Change-Id: I7e3447dcddc7998e3e3c30f4f0019d91a99c79ce
    (cherry picked from commit e7154f3d77ee4c06eec603a850ec941668eb602f)

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to masakari-monitors in Ubuntu.
https://bugs.launchpad.net/bugs/1930361

Title:
  hostmonitor hangs after notifications send failed

Status in masakari-monitors:
  Fix Released
Status in masakari-monitors ussuri series:
  Fix Committed
Status in masakari-monitors victoria series:
  Fix Committed
Status in masakari-monitors wallaby series:
  Fix Committed
Status in masakari-monitors xena series:
  Fix Released
Status in masakari-monitors package in Ubuntu:
  Confirmed

Bug description:
  In an env, we found one hostmonitor didn't log anymore after send host
  failure notification failed.

  I noticed that in the monitor_hosts it will exit if once it catch some
  exception. So there is risk, that if one host down later, no recovery
  will be triggered.

  See comment #5 for a detailed analysis.

To manage notifications about this bug go to:
https://bugs.launchpad.net/masakari-monitors/+bug/1930361/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list