Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
The exception stack is as following:
310735 2016-05-22 01:50:04,554 [Container Monitor] ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Container Monitor,5,main] threw an Exception. 310736 org.apache.hadoop.metrics2.MetricsException: Metrics source ContainerResource_container_1463840817638_14484_01_000010 already exists! 310737 at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) 310738 at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) 310739 at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) 310740 at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:212) 310741 at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:198) 310742 at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:385)
After YARN-4906, we have multiple places to get ContainerMetrics for a particular container that could cause race condition in registering the same container metrics to DefaultMetricsSystem by different threads. Lacking of proper handling of MetricsException which could get thrown, the exception will could bring down daemon of ContainerMonitorImpl or even whole NM.
Attachments
Attachments
Issue Links
- is related to
-
YARN-5296 NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl
- Resolved
-
HADOOP-13362 DefaultMetricsSystem leaks the source name when a source unregisters
- Closed