[MAPREDUCE-2018] TeraSort example fails in trunk - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.22.0
Fix Version/s: None
Component/s: examples
Labels:
None
Environment:

Compile, build and run from trunk terasort example using several random files as input. Terasort will fail

Description

Exceptions are thrown while computing splits near the end of file - typically when the number of bytes read is smaller than RECORD_LENGTH

10/08/17 22:44:17 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
10/08/17 22:44:17 INFO input.FileInputFormat: Total input paths to process : 1
Spent 19ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 22ms
Sampling 1 splits of 1
Got an exception while reading splits java.io.EOFException: read past eof
at org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:267)
at org.apache.hadoop.examples.terasort.TeraInputFormat$1.run(TeraInputFormat.java:181)

TeraInoutFormat I believe assumes the file sizes are exact multiples of RECORD_LENGTH

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

mapred-2018.patch
17/Aug/10 22:55
2 kB
Krishna Ramachandran

Activity

People

Assignee:: Unassigned

Reporter:: Krishna Ramachandran

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Aug/10 22:52

Updated:: 30/Jul/14 23:31

Resolved:: 30/Jul/14 23:31