Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
0.20.2
-
None
-
None
-
None
Description
CombineFileInputFormat creates splits based on blocks, regardless whether the underlying FileInputFormat is splittable or not..
This means that we can have 2 or more splits for a compressed text file with TextInputFormat. For each of these splits, TextInputFormat.getRecordReader will return a RecordReader for the whole compressed file, thus causing duplicate input data.
Attachments
Attachments
Issue Links
- is duplicated by
-
MAPREDUCE-1597 combinefileinputformat does not work with non-splittable files
- Closed