Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
When zookeeper session failures occur in a stream processor, leaves the group(zkClient is closed) and joins the group again.
The last step in that shutdown sequence is zkClient.close(). In some scenarios, it throws the following exception,
org.I0Itec.zkclient.exception.ZkInterruptedException: java.lang.InterruptedException at org.I0Itec.zkclient.ZkClient.close(ZkClient.java:1278) at org.apache.samza.zk.ZkControllerImpl.stop(ZkControllerImpl.java:92) at org.apache.samza.zk.ZkJobCoordinator.stop(ZkJobCoordinator.java:141)
In existing implementation this is not handled, there by killing the stream processor. The following codepath triggers this exception:
StreamProcessor.stop -> ZkJobCoordinator.stop() -> zkController.stop() -> zkUtils.close
This exception causes the integration test to fail occasionally and can cause LocalApplicationRunner.waitForFinish method call to block indefinitely(since this callback event success, updates the latch state required for waitForFinish to end).
Attachments
Issue Links
- links to