[SLIDER-939] flex down does not cancel the outstanding request - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: Slider 0.80
Fix Version/s: Slider 1.0.0
Component/s: core
Labels:
- patch
Environment:

Hadoop 2.7.1
Slider 0.80.0

Flags:

Important

Description

I run slider app on a 6 nodes cluster. To ensure there is only one comonent(worker) instance on each node, I set yarn.memory to 51% of the total memory.
Then I flex up to 7 workers, there would be one worker request(outstanding) that will never be met, this is expected.

Then I flexed down back to 6 workers, and any container request for any job would be blocked even if there are plenty of memory/core for the job, From RM log, we can see there are continuous output:
capacity.CapacityScheduler (CapacityScheduler.java:allocateContainersToNode(1240)) - Skipping scheduling since node test.example.com:45454 is reserved by application appattempt_1442384698868_0008_000001

It seems the outstanding requests are not actually cancelled in the requesting container queue but keep trying to request.

After I flexed down to 5 workers, the other blocked jobs can run.
This is related to JIRA https://issues.apache.org/jira/browse/SLIDER-490

Attachments

Issue Links

depends upon

SLIDER-955 fail to track the outstandingRequest when submit an application that yarn.memory is not a multiple of minimum-allocation-mb

Resolved

is depended upon by

SLIDER-937 Release Slider 0.81.1

Resolved

Activity

People

Assignee:: Steve Loughran

Reporter:: Youjie Chen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Sep/15 09:12

Updated:: 15/Jun/16 06:48