[YARN-10791] Graceful decomission cause NPE during rolling upgrade from 2.6 to 3.2 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.2.1
Fix Version/s: None
Component/s: RM
Labels:
None

Description

We are rolling upgrading Yarn from 2.6.0 to 3.2.1, and we met this Exception while we upgrading NM.

When we exclude a node and call refreshNode gracefully, All the MR AMs will fail.

2021-05-28 11:36:35,790 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM.
java.lang.NullPointerException
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleUpdatedNodes(RMContainerAllocator.java:883)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:821)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:316)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:282)
at java.lang.Thread.run(Thread.java:745)

The reason of this is because we gracefully decomission nodes while using 2.6MR.

handleUpdatedNodes of 2.6MR can not recognize the node state of "DECOMMISONING"

So I add a config to decide if we should send the DECOMMISONING to AMs

I don't know if it needs to be fixed, just raise a solution for this situation

There are 2 nodes in the cluster, And the AM is deployed in node 44, I excluded 46, which is another node in the cluster, and then refreshnode, the error above occured.

As what I say, I think the original reasion is the compatibility of NodeStateProto

2.6 MR can not recognize DECOMMISONING and SHUTDOWN

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-10791.v1.patch
28/May/21 03:56
10 kB
Song Jiacheng
image-2021-05-31-10-37-31-795.png
31/May/21 02:37
13 kB
Song Jiacheng
image-2021-05-31-10-32-17-541.png
31/May/21 02:32
19 kB
Song Jiacheng

Activity

People

Assignee:: Unassigned

Reporter:: Song Jiacheng

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/May/21 03:47

Updated:: 02/Jun/21 03:27