OrcInputFormat::getSplits() fires off a SplitGenerator task for every file in the task.
The footer & data are on the same node for all files with only 1 hdfs block. On top of that, it will never need a further split as long as its total size is < context.maxSize.
Reading that footer locally is faster than reading it in the split gen and sending it from the AM.
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Release Note||Avoid reading ORC footers for files where data and footer are in the same HDFS block|
|Affects Version/s||tez-branch [ 12324744 ]|
|Affects Version/s||0.13.0 [ 12324986 ]|
|Fix Version/s||tez-branch [ 12324744 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|