Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-11
Description
In IcebergUtil.getIcebergDataFiles() we issue scan.planFiles():
https://github.com/apache/impala/blob/7f1ce039be30d5b36a490e8b07728f82f5d4c3de/fe/src/main/java/org/apache/impala/util/IcebergUtil.java#L534
scan.planFiles() needs to read the manifest files to return a list of files to be scanned. This unfortunately adds significant overhead to the plan time for short-running queries.
Maybe we can do the followings to mitigate this issue:
- cache TableScan.planFiles() without predicates being used, and use this instead of pushing predicates to Iceberg. It would need a logic to decide when to use the cached plan files and when to push down predicates
- Figure out if it is possible to cache manifest files so we don't need to re-read them for each table scan.
- If this is not possible then we might need to contribute code to Iceberg
Attachments
Attachments
Issue Links
- is related to
-
IMPALA-11658 Implement Iceberg manifest caching configuration for Impala
- Resolved