[YARN-9111] NM crashes because Fair scheduler promotes a container that has not been pulled by AM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: YARN-1011
Fix Version/s: None
Component/s: fairscheduler, nodemanager
Labels:
None

Description

2018-10-19 22:34:35,052 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 java.lang.NullPointerException
 at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:323)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.handle(ContainerManagerImpl.java:1649)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.handle(ContainerManagerImpl.java:185)
 at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
 at java.lang.Thread.run(Thread.java:748)
 2018-10-19 22:34:35,054 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
 2018-10-19 22:34:35,059 DEBUG org.apache.hadoop.service.AbstractService: Service: NodeManager entered state STOPPED

When a container is allocated by RM to an application, its container token is not generated until the AM pulls that container from RM.

However, it the scheduler decides to promote that container before it is pulled by the AM, it does not have container token to work with.

The current code does not update/generate the container token as such. When container promotion is sent to NM to process, the NM crashes on NPE.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Haibo Chen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Dec/18 22:38

Updated:: 11/Dec/18 22:38