Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10354

deadlock in ContainerMetrics and MetricsSystemImpl

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • nodemanager
    • None
    • hadoop 3.1.2

    Description

      Could not get information about jmx in nodemanager. and I found deadlock through thread dump.

      Below is the deadlock threads.

      "Timer for 'NodeManager' metrics system" - Thread t@42
         java.lang.Thread.State: BLOCKED
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.getMetrics(ContainerMetrics.java:235)
              - waiting to lock <7668d6f0> (a org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics) owned by "NM ContainerManager dispatcher" t@299
              at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
              at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
              at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
              - locked <3b956878> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
              at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381)
              - locked <3b956878> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
              at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368)
              at java.util.TimerThread.mainLoop(Timer.java:555)
              at java.util.TimerThread.run(Timer.java:505)   Locked ownable synchronizers:
              - None
      
      
      
      "NM ContainerManager dispatcher" - Thread t@299
         java.lang.Thread.State: BLOCKED
              at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.unregisterSource(MetricsSystemImpl.java:247)
              - waiting to lock <3b956878> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl) owned by "Timer for 'NodeManager' metrics system" t@42
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.unregisterContainerMetrics(ContainerMetrics.java:228)
              - locked <4e31c3ec> (a java.lang.Class)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.finished(ContainerMetrics.java:255)
              - locked <7668d6f0> (a org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.updateContainerMetrics(ContainersMonitorImpl.java:813)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.onStopMonitoringContainer(ContainersMonitorImpl.java:935)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.handle(ContainersMonitorImpl.java:900)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.handle(ContainersMonitorImpl.java:57)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
              at java.lang.Thread.run(Thread.java:745)   Locked ownable synchronizers:
              - None
      
      
      

       

       

      Attachments

        1. full_thread_dump.txt
          1.54 MB
          Lee young gon

        Activity

          People

            Unassigned Unassigned
            dasom Lee young gon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: