[STORM-3602] loadaware shuffle can overload local worker - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0, 2.1.0
Fix Version/s: 2.2.0, 2.1.1
Component/s: None
Labels:
- pull-request-available

Description

We were seeing a worker overloaded and tuples timing out with loadaware shuffle enabled. From investigating, we found that the code allows switching from Host local to Worker local if the load average is lower than the low water mark. It really should be checking the load on the worker instead.

What's happening is the worker is overloaded with tons of idle host local tasks, so it switches to HOST_LOCAL. Then the calculation across all the host tasks is below the low water mark and it immediately switches back to the overloaded worker local task.

Attachments

Issue Links

links to

GitHub Pull Request #3227

Activity

People

Assignee:: Aaron Gresch

Reporter:: Aaron Gresch

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Mar/20 16:02

Updated:: 01/Jul/20 14:07

Resolved:: 19/Mar/20 14:00

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: