Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 4.0.0
-
None
-
ghx-label-9
Description
Currently, FOOTER_SIZE is a constant of 100KB in HdfsScanner::IssueFooterRanges:
https://github.com/apache/impala/blob/57982ef/be/src/exec/hdfs-scanner.cc#L832
Some scanner subclass such as HdfsOrcScanner expect that footer size is much lower at 16KB. We should add footer_size as parameter so different file format can ask for different footer range size.
Having more precise initial range can also help reduce waste in the data cache:
https://github.com/apache/impala/blob/57982ef/be/src/runtime/io/data-cache.h#L73