Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1970

tasktracker hang in reduce. Deadlock between main and comm thread

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.14.1
    • 0.14.2
    • None
    • None

    Description

      Saw one reduce task stuck on copy.
      jstack on the reduce task(task_200709272248_0001_r_000150_0) process showed

       
      Found one Java-level deadlock:
      =============================
      "Comm thread for task_200709272248_0001_r_000150_0":
        waiting to lock monitor 0x08144020 (object 0xd4e30aa8, a org.apache.hadoop.util.Progress),
        which is held by "main"
      "main":
        waiting to lock monitor 0x08144084 (object 0xd4e30958, a org.apache.hadoop.util.Progress),
        which is held by "Comm thread for task_200709272248_0001_r_000150_0"
      
      Java stack information for the threads listed above:
      ===================================================
      "Comm thread for task_200709272248_0001_r_000150_0":
              at org.apache.hadoop.util.Progress.toString(Progress.java:113)
              - waiting to lock <0xd4e30aa8> (a org.apache.hadoop.util.Progress)
              at org.apache.hadoop.util.Progress.toString(Progress.java:116)
              - locked <0xd4e30958> (a org.apache.hadoop.util.Progress)
              at org.apache.hadoop.util.Progress.toString(Progress.java:108)
              at org.apache.hadoop.mapred.Task$1.run(Task.java:268)
              at java.lang.Thread.run(Thread.java:619)
      "main":
              at org.apache.hadoop.util.Progress.startNextPhase(Progress.java:58)
              - waiting to lock <0xd4e30958> (a org.apache.hadoop.util.Progress)
              at org.apache.hadoop.util.Progress.complete(Progress.java:70)
              - locked <0xd4e30aa8> (a org.apache.hadoop.util.Progress)
              at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253)
              at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1777)
      

      Attachments

        1. 1970_patch02
          1 kB
          Vivek Ratan
        2. 1970_patch01
          4 kB
          Vivek Ratan

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vivekr Vivek Ratan
            knoguchi Koji Noguchi
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment