Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2954

Deadlock in NM with threads racing for ApplicationAttemptId

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.23.0, 2.0.0-alpha
    • 0.23.0
    • mrv2
    • None
    • Reviewed

    Description

      Found this:

      Java stack information for the threads listed above:
      ===================================================
      "Thread-45":
              at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.getApplicationId(ApplicationAttemptIdPBImpl.java:101)
              - waiting to lock <0xb6a43ba0> (a org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
              at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:144)
              - locked <0xb6a443a0> (a org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
              at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:31)
              at org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:215)
              at org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:34)
              at java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:797)
              at java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1640)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:360)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:355)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:113)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
              at java.lang.Thread.run(Thread.java:619)
      "Thread-30":
              at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.getApplicationId(ApplicationAttemptIdPBImpl.java:101)
              - waiting to lock <0xb6a443a0> (a org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
              at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:144)
              - locked <0xb6a43ba0> (a org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
              at org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:31)
              at org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:215)
              at org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:34)
              at java.util.concurrent.ConcurrentSkipListMap.doRemove(ConcurrentSkipListMap.java:1078)
              at java.util.concurrent.ConcurrentSkipListMap.remove(ConcurrentSkipListMap.java:1673)
              at java.util.concurrent.ConcurrentSkipListMap$Iter.remove(ConcurrentSkipListMap.java:2256)
              at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:223)
              at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$300(NodeStatusUpdaterImpl.java:62)
              at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:262)
      Found 1 deadlock.
      

      Attachments

        1. MR2954_2.patch
          29 kB
          Siddharth Seth
        2. MR2954_1.patch
          29 kB
          Siddharth Seth
        3. MAPREDUCE-2954-20110909.txt
          29 kB
          Vinod Kumar Vavilapalli

        Activity

          People

            sseth Siddharth Seth
            vinodkv Vinod Kumar Vavilapalli
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: