Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
tez-branch
-
Avoid reading ORC footers for files where data and footer are in the same HDFS block
Description
OrcInputFormat::getSplits() fires off a SplitGenerator task for every file in the task.
The footer & data are on the same node for all files with only 1 hdfs block. On top of that, it will never need a further split as long as its total size is < context.maxSize.
Reading that footer locally is faster than reading it in the split gen and sending it from the AM.