Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Currently, when a Processor is removed from the graph, the State Manager's onComponentRemoved method is called. This is called in a synchronous/blocking manner. By default, the timeout to communicate with ZooKeeper is 30 seconds. There are cases when the request can time out (for instance, due to improper Kerberos configuration), and with a 30 second timeout, this often results in nodes getting kicked out of the cluster.
However, if the request fails, we simply log a warning and move on. As a result, this synchronous network call should be moved to a background thread to ensure that it does not interfere with the web request to remove the component.