[STORM-1696] Backpressure flag not sync if zookeeper connection errors - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.0.0, 2.0.0
Fix Version/s: 2.0.0, 1.0.1
Component/s: None
Labels:
None

Epic Link:
Release Apache Storm 1.0.1

Description

When there is a zk exception happens during worker-backpressure!,
there is a bad state which can block the topology from running normally any more.

The root cause: in worker/mk-backpressure-handler
if the worker-backpressure! fails once due to zk connection exception,
next time when this method gets called by WordBackpressureThread, because (when (not= prev-backpressure-flag curr-backpressure-flag) will never be true, the remote zk node can not be synced with local state.

This also explains why we will not see any problem when testing in a stable (zk never fail) environment.

Solution is quite straightforward: first change the zk status, if succeeds, change local status.

This fixes the hidden bug and removes redundant flags in executor-data and worker-data (since we can get the executor status directly from the "_throttleOn" boolean in the DisruptorQueue)

Attachments

Issue Links

relates to

STORM-2039 Backpressure refactoring in worker and executor

Resolved

Activity

People

Assignee:: Zhuo Liu

Reporter:: Zhuo Liu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Apr/16 20:45

Updated:: 15/Aug/16 19:28

Resolved:: 19/Apr/16 16:55