[YARN-10968] SchedulingRequests can be wrong when multiple containers stopped at the same time - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.2
Fix Version/s: None
Component/s: None
Labels:
None

Target Version/s:

3.1.2

Description

There are two ways to request containers to RM through AMRMClientImpl.

addContainerRequest
addSchedulingRequests

These two requests are linked to each parameter in Scheduler's allocate()

# addContainerRequest <-> ask
# addSchedulingRequests  <->  schedulingRequestspublic Allocation allocate(ApplicationAttemptId applicationAttemptId,
      List<ResourceRequest> ask, List<SchedulingRequest> schedulingRequests,
      List<ContainerId> release, List<String> blacklistAdditions,
      List<String> blacklistRemovals, ContainerUpdates updateRequests) {
    FiCaSchedulerApp application = getApplicationAttempt(applicationAttemptId);

We are using yarn-service and placement_policy, in which case addSchedulingRequests is used.

AddSchedulingRequests have the problems.

When two containers are terminated at the same time in the presence of a placement_policy, AM requests a submitting scheduling request twice as follows.

2021-03-31 17:56:07,485 [Component  dispatcher] INFO  component.Component - [COMPONENT sleep] Requesting for 1 container(s)
2021-03-31 17:56:07,485 [Component  dispatcher] INFO  component.Component - [COMPONENT sleep] Submitting scheduling request: SchedulingRequestPBImpl{priority=0, allocationReqId=0, executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, allocationTags=[testapp], resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:512, vCores:1>}, placementConstraint=notin,node,yarn_node_partition/=[test2]:notin,node,testapp}2021-03-31 17:56:07,486 [Component  dispatcher] INFO  component.Component - [COMPONENT sleep] Requesting for 1 container(s)
2021-03-31 17:56:07,487 [Component  dispatcher] INFO  component.Component - [COMPONENT sleep] Submitting scheduling request: SchedulingRequestPBImpl{priority=0, allocationReqId=0, executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, allocationTags=[testapp], resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:512, vCores:1>}, placementConstraint=notin,node,yarn_node_partition/=[test2]:notin,node,testapp}

And this comes to RM at each request.

Then if the above request is received, the SingleConstrainAppPlaceAllocatorwill have only the last value.

In other words, if multiple containers die at the same time, multiple requests are created, and RM accepts only the final one request and allocates it.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-10968.002.patch
24/Sep/21 06:52
3 kB
Lee young gon

Activity

People

Assignee:: Unassigned

Reporter:: Lee young gon

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Sep/21 03:27

Updated:: 24/Sep/21 07:23