Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2018

TeraSort example fails in trunk

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.22.0
    • None
    • examples
    • None
    • Compile, build and run from trunk terasort example using several random files as input. Terasort will fail

    Description

      Exceptions are thrown while computing splits near the end of file - typically when the number of bytes read is smaller than RECORD_LENGTH

      10/08/17 22:44:17 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
      10/08/17 22:44:17 INFO input.FileInputFormat: Total input paths to process : 1
      Spent 19ms computing base-splits.
      Spent 2ms computing TeraScheduler splits.
      Computing input splits took 22ms
      Sampling 1 splits of 1
      Got an exception while reading splits java.io.EOFException: read past eof
      at org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:267)
      at org.apache.hadoop.examples.terasort.TeraInputFormat$1.run(TeraInputFormat.java:181)

      TeraInoutFormat I believe assumes the file sizes are exact multiples of RECORD_LENGTH

      Attachments

        1. mapred-2018.patch
          2 kB
          Krishna Ramachandran

        Activity

          People

            Unassigned Unassigned
            ramach Krishna Ramachandran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: