Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1144

JT should not hold lock while writing user history logs to DFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.20.1
    • None
    • jobtracker
    • None

    Description

      I've seen behavior a few times now where the DFS is being slow for one reason or another, and the JT essentially locks up waiting on it while one thread tries for a long time to write history files out. The stack trace blocking everything is:

      Thread 210 (IPC Server handler 10 on 7277):
      State: WAITING
      Blocked count: 171424
      Waited count: 1209604
      Waiting on java.util.LinkedList@407dd154
      Stack:
      java.lang.Object.wait(Native Method)
      java.lang.Object.wait(Object.java:485)
      org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3122)
      org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3202)
      org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3151)
      org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:67)
      org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
      sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:301)
      sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
      java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
      java.io.BufferedWriter.close(BufferedWriter.java:248)
      java.io.PrintWriter.close(PrintWriter.java:295)
      org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1349)
      org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2167)
      org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2111)
      org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:873)
      org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3598)
      org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2792)
      org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2581)
      sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)

      We should try not to do external IO while holding the JT lock, and instead write the data to an in-memory buffer, drop the lock, and then write.

      Attachments

        1. MAPREDUCE-1144-branch-1.2.patch
          7 kB
          yunjiong zhao

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: