Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5313

JobTracker Creates Empty Mapper Task, and a Mapper Task with 2 FileSplits.

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2.0
    • Fix Version/s: None
    • Component/s: jobtracker
    • Labels:
      None
    • Environment:

      Linux

      Description

      When reading an input text file, the Job Tracker seems to assign the first two FileSplits to a single Mapper Task, then assigns an EMPTY FileSplit (end of file) to a Mapper Task, which finishes instantaneously. This can affect job balance, since one map job is now twice as big as the others.

      In "src/mapred/org/apache/hadoop/mapred/LineRecordReader.java", line 110, there is a comment about skipping the first line of the input file by default, since "next()" reads two lines anyway. This was not the behavior in 0.20.2, which did not have this problem.

      Seems perhaps related to :

      "HADOOP-4010. Change semantics for LineRecordReader to read an additional
      line per split- rather than moving back one character in the stream- to
      work with splittable compression codecs. (Abdul Qadeer via cdouglas)"

      It seems this was not implemented properly and is leading to the issue described above in the situation that the input file is text.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              noelcodella Noel C. F. Codella, Ph.D.
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: