[YARN-6483] Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.1.0, 3.0.1
Component/s: resourcemanager
Labels:
None

Target Version/s:

3.1.0, 3.0.1

Description

The DECOMMISSIONING node state is currently used as part of the graceful decommissioning mechanism to give time for tasks to complete in a node that is scheduled for decommission, and for reducer tasks to read the shuffle blocks in that node. Also, YARN effectively blacklists nodes in DECOMMISSIONING state by assigning them a capacity of 0, to prevent additional containers to be launched in those nodes, so no more shuffle blocks are written to the node. This blacklisting is not effective for applications like Spark, because a Spark executor running in a YARN container will keep receiving more tasks after the corresponding node has been blacklisted at the YARN level. We would like to propose a modification of the YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat. This way a Spark application master would be able to blacklist a DECOMMISSIONING at the Spark level.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-6483.002.patch
17/Nov/17 19:26
48 kB
Juan Rodríguez Hortalá
YARN-6483.003.patch
22/Nov/17 03:32
68 kB
Juan Rodríguez Hortalá
YARN-6483.branch-3.0.addendum.patch
06/Dec/17 01:43
1 kB
Arun Suresh
YARN-6483-v1.patch
04/May/17 17:08
4 kB
Juan Rodríguez Hortalá

Issue Links

duplicates

YARN-3224 Notify AM with containers (on decommissioning node) could be preempted after timeout.

Resolved

is related to

YARN-10538 Add recommissioning nodes to the list of updated nodes returned to the AM

Resolved

YARN-11125 Backport YARN-6483 to branch-2.10

Resolved

links to

GitHub Pull Request #289

Activity

People

Assignee:: Juan Rodríguez Hortalá

Reporter:: Juan Rodríguez Hortalá

Votes:: 1 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 14/Apr/17 17:35

Updated:: 13/May/22 16:22

Resolved:: 06/Dec/17 20:11