Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.0.0, 2.0.0
-
None
-
None
Description
When there is a zk exception happens during worker-backpressure!,
there is a bad state which can block the topology from running normally any more.
The root cause: in worker/mk-backpressure-handler
if the worker-backpressure! fails once due to zk connection exception,
next time when this method gets called by WordBackpressureThread, because (when (not= prev-backpressure-flag curr-backpressure-flag) will never be true, the remote zk node can not be synced with local state.
This also explains why we will not see any problem when testing in a stable (zk never fail) environment.
Solution is quite straightforward: first change the zk status, if succeeds, change local status.
This fixes the hidden bug and removes redundant flags in executor-data and worker-data (since we can get the executor status directly from the "_throttleOn" boolean in the DisruptorQueue)
Attachments
Issue Links
- relates to
-
STORM-2039 Backpressure refactoring in worker and executor
- Resolved