Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-82

JobTracker loses it: NoSuchElementException

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None

      Description

      On a number of occasions, JobTracker goes into a loop that it never recovers from. Over and over it prints the below to the jobtracker log.

      060304 124522 Server handler 5 on 8010 call error: java.io.IOException: java.util.NoSuchElementException
      java.io.IOException: java.util.NoSuchElementException
      at java.util.TreeMap.key(TreeMap.java:433)
      at java.util.TreeMap.firstKey(TreeMap.java:287)
      at java.util.TreeSet.first(TreeSet.java:407)
      at org.apache.hadoop.mapred.TaskInProgress.getTaskToRun(TaskInProgress.java:428Timed out.org.apache.hadoop.fs.ChecksumException: Checksum error:/2/hadoop/nara/data/tmp/task_r_m80hob/all.1 at 1554810368 at
      org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:122)
      at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:98)
      at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:158)
      at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at
      java.io.BufferedInputStream.read(BufferedInputStream.java:235) at
      org.apache.hadoop.fs.FSDataInputStream$Buffer.read(FSDataInputStream.java:210)
      at java.io.DataInputStream.readInt(DataInputStream.java:353) at
      org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:367) at
      org.apache.hadoop.io.SequenceFile$Sorter$SortPass.run(SequenceFile.java:557)
      at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:523)
      at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:511)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:254) at
      org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:666)

      I added debug to TIP#getTaskToRun so i could tell which TIP had empted its allotment of tasks. Below is extract from jobtracker log that shows sequence of events for TIP tip_fizr7m that lead up to JT losing it:

      060314 203637 Adding task 'task_m_4d6ht0' to tip tip_fizr7m, for tracker 'tracker_41791' on ia109314.archive.org
      060314 204758 Task 'task_m_4d6ht0' has been lost.
      060314 204811 Adding task 'task_m_fb0wf0' to tip tip_fizr7m, for tracker 'tracker_70065' on ia109314.archive.org
      060314 210118 Task 'task_m_fb0wf0' has been lost.
      060314 210119 Adding task 'task_m_irar47' to tip tip_fizr7m, for tracker 'tracker_82285' on ia109324.archive.org
      060314 211541 Taskid 'task_m_irar47' has finished successfully.
      060314 211541 Task 'task_m_irar47' has completed.
      060314 211543 Adding task 'task_m_qo1g69' to tip tip_fizr7m, for tracker 'tracker_97839' on ia109306.archive.org
      060314 213004 Taskid 'task_m_qo1g69' has finished successfully.
      060314 213004 Task 'task_m_qo1g69' has completed.
      060314 213005 Adding task 'task_m_t0lnzk' to tip tip_fizr7m, for tracker 'tracker_57273' on ia109314.archive.org
      060314 214118 Task 'task_m_t0lnzk' has been lost.

      So, we lose two, complete two, then lose a third.

      TIP should have been done on first completion.

      TIP accounting is off.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              stack stack
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: