Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.23.0
-
None
-
Reviewed
-
Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly.
Description
Another collaboration with karams. Sort job hangs not so rarely on a 350 node cluster. Found this in AM logs:
Exception in thread "ContainerLauncher #60" org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) ... 4 more Exception in thread "ContainerLauncher #53" org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) ... 5 more
Attachments
Attachments
Issue Links
- is blocked by
-
MAPREDUCE-3333 MR AM for sort-job going out of memory
- Closed