[FLINK-19324] Map requested/allocated containers with priority on YARN - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.12.0
Component/s: Deployment / YARN
Labels:
- pull-request-available

Description

In the design doc of ~~FLINK-14106~~, there was a discussion on how we map allocated containers with the requested ones on YARN. We rejected the design option that uses container priorities for mapping containers of different resources, because we do not want to priorities different container requests (which is the original purpose for this field). As a result, we have to interpret how the requested container request would be normalized by Yarn, and map the allocated/requested containers accordingly, which is complicated and fragile. See also ~~FLINK-19151~~.

Recently in our POC for fine grained resource management, we surprisingly discovered that Yarn actually doesn't work with container requests same priority and different resources. I do not find this described as an official protocol in any Yarn's documents. The issue has been raised in early Yarn versions (YARN-314) and has not been fixed util Hadoop 2.9 when allocationRequestId is introduced. In Hadoop 2.8, Yarn scheduler is still internally using priority as the key of a container request (see AppSchedulingInfo#updateResourceRequests ), thus requests same priority and different resources would overwrite each other.

The new discovery suggests that, if we want to support containers with different resources on Hadoop 2.8 and earlier versions, we have to give them different priorities anyway. Thus, I would suggest to get rid of the container normalization simulation and go back to the previously rejected priority based design option.

Attachments

Issue Links

relates to

FLINK-19151 Flink does not normalize container resource with correct configurations when Yarn FairScheduler is used

Closed

FLINK-14106 Make SlotManager pluggable

Closed

supercedes

FLINK-19556 YarnResourceManagerDriver.getPendingRequestsAndCheckConsistency only returns container requests for first matching equivalent resource

Closed

links to

GitHub Pull Request #13592

Activity

People

Assignee:: Xintong Song

Reporter:: Xintong Song

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Sep/20 09:59

Updated:: 16/Oct/20 10:59

Resolved:: 16/Oct/20 10:59