[YARN-6031] Application recovery has failed when node label feature is turned off during RM recovery - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
Component/s: scheduler
Labels:
None

Hadoop Flags:

Reviewed

Description

Here is the repro steps:
Enable node label, restart RM, configure CS properly, and run some jobs;
Disable node label, restart RM, and the following exception thrown:

Caused by: org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: Invalid resource request, node label not enabled but request contains label expression
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        ... 10 more

During RM restart, application recovery failed due to that application had node label expression specified while node label has been disabled.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-6031.001.patch
28/Dec/16 10:13
3 kB
Ying Zhang
YARN-6031.002.patch
06/Jan/17 09:22
4 kB
Ying Zhang
YARN-6031.003.patch
10/Jan/17 09:22
9 kB
Ying Zhang
YARN-6031.004.patch
11/Jan/17 03:54
9 kB
Ying Zhang
YARN-6031.005.patch
11/Jan/17 07:47
9 kB
Ying Zhang
YARN-6031.006.patch
11/Jan/17 09:08
9 kB
Ying Zhang
YARN-6031.007.patch
20/Jan/17 06:17
9 kB
Ying Zhang
YARN-6031-branch-2.8.001.patch
08/Feb/17 07:32
10 kB
Ying Zhang

Issue Links

is related to

YARN-4401 A failed app recovery should not prevent the RM from starting

Resolved

YARN-4465 SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

Resolved

relates to

YARN-6035 Add -force-recovery option to yarn resourcemanager

Open

Activity

People

Assignee:: Ying Zhang

Reporter:: Ying Zhang

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 28/Dec/16 05:48

Updated:: 26/Jul/17 13:53

Resolved:: 08/Feb/17 10:36