Ticket #3029 (closed defect: Fixed)

Opened 7 months ago

Last modified 7 weeks ago

zenhub down detection knocks down collectors under the watchdog

Reported by: ecn Owned by: ecn
Priority: 2 - High Milestone: zenoss-2.2
Component: All Version: 2.1.92
Keywords: Cc:
Community Patch Attached: Deployed @ Customer:
Installer: Maintenance Target:
Specific ZenPack: Maintenance Status:
Documentation Note?: Not required Regression:

Description

Assuming ZenHub is busy and unresponsive. Also, the collectors start missing heartbeats. Then the watchdog will restart the collector, but due to the zenhub timeout, the collectors exit. The watchdog doesn't restart them because they stopped cleanly.

Suggest:

Print out the helpful message: "is zenhub running?"... but don't stop trying to connect.

Change History

Changed 7 months ago by ecn

  • status changed from new to closed
  • resolution set to fixed

(In [9096]) * fixes #3029: watchdog and busy zenhub can result in servers killing themselves and not recovering

Changed 7 weeks ago by bbibeault

  • reviewed set to 1
Note: See TracTickets for help on using tickets.