Ticket #3588 (closed defect: Fixed)

Opened 3 months ago

Last modified 3 months ago

Daemons don't restart cycles after zenhub connection failure

Reported by: cluther Owned by: ecn
Priority: 2 - High Milestone: zenoss-2.3
Component: ZenHub Version: 2.2.3
Keywords: zenoss-2.2.4-verified Cc:
Reviewed: yes Community Patch Attached:
Deployed @ Customer: Installer:
Microrelease Target: Specific ZenPack:
Microrelease Status: Documentation Note?: Not required
Regression:

Description

When the collector daemons timeout connecting to zenhub, then later reconnect they will not automatically start their collection cycles. This leaves them running, but doing nothing. Fortunately they aren't even sending heartbeats so the failure is known.

This is most prevalent when the daemons fail to connect when they are first started and before they start their first collection cycle.

Change History

Changed 3 months ago by cluther

I created a phony hub service called Busy that calls time.sleep(600) and used it in zenping. After the 10 minutes are expired and the daemon tries to reconnect to zenhub the following exception is thrown. After which the daemon stays running, but does nothing.

ERROR:zen.zenperfsnmp:Timeout connecting to zenhub: is it running?

WARNING:zen.zenperfsnmp:Reconnected to ZenHub
Unhandled error in Deferred:
Traceback (most recent call last):
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 317, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 507, in _cbDeferred
    self.callback(self.resultList)
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 239, in callback
    self._startRunCallbacks(result)
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 304, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 317, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 239, in callback
    self._startRunCallbacks(result)
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 290, in _startRunCallbacks
    raise AlreadyCalledError
twisted.internet.defer.AlreadyCalledError:

Changed 3 months ago by ecn

  • status changed from new to closed
  • resolution set to fixed

(In [9915]) * fixes #3588: reviewed by chet

Changed 3 months ago by bbibeault

  • keywords zenoss-2.2.4-accepted added; zenoss-2.2.4-proposed removed
  • reviewed set

Changed 3 months ago by ian

(In [9991]) Refs #3588: Backporting r9915 to the zenoss-2.2.x branch

Changed 3 months ago by ian

  • keywords zenoss-2.2.4-patched added; zenoss-2.2.4-accepted removed

Changed 3 months ago by cholden

  • keywords zenoss-2.2.4-verified added; zenoss-2.2.4-patched removed

Verified on 2.2.4 (build 59) native rpm. zenping and zenmodeler reconnected after initial timeouts.

Note: See TracTickets for help on using tickets.