Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
If a split consists of multiple files, the FileFormat should always be the same, whether RCFile or SequenceFile. Currently the CombineHiveInputSplit tries to get the inputFileFormat for each new file in the split, which is O where n is the number of files in the split. This is an O(n^2) operation and degrade the performance badly for combining large number of small files.