Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
CombineFileInputFormat.getSplits() does not take into account whether a file is splittable.
This can lead to a problem for compressed text files - for example, getSplits() may return more
than 1 split depending on the size of the compressed file, all the splits recordreader will read the
complete file.
I ran into this problem while using Hive on hadoop 20.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-11376 CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files
-
- Closed
-
- duplicates
-
MAPREDUCE-1649 Compressed files with TextInputFormat does not work with CombineFileInputFormat
-
- Resolved
-
- is related to
-
SQOOP-721 Duplicating rows on export when exporting from compressed files.
-
- Resolved
-
- supercedes
-
HIVE-2089 Add a new input format to be able to combine multiple .gz text files
-
- Resolved
-