[TEZ-3935] DAG aware scheduler should release unassigned new containers rather than hold them - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.2, 0.10.0
Component/s: None
Labels:
None

Target Version/s:

0.9.2, 0.10.0

Description

I saw a case for a very large job with many containers where the DAG aware scheduler was getting behind on assigning containers. Newly assigned containers were not finding any matching request, so they were queued for reuse processing. However it took so long to get through all of the task and container events that the container allocations expired before the container was finally assigned and attempted to be launched.

Newly assigned containers are assigned to their matching requests, even if that violates the DAG priorities, so it should be safe to simply release these if no tasks could be found to use them. The matching request has either been removed or already satisified with a reused container. Besides, if we can't find any tasks to take the newly assigned container then it is very likely we have plenty of reusable containers already, and keeping more containers just makes the job a resource hog on the cluster.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TEZ-3935.001.patch
14/May/18 16:29
8 kB
Jason Darrell Lowe

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/May/18 15:26

Updated:: 19/Apr/19 19:21

Resolved:: 22/May/18 18:31