Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.13.0
Description
After introduction of lazy-listing capability in HUDI-4812, it exposed an issue in Spark's design, where predicates are pushed-down into generic FileIndex implementations only during the execution phase.
This poses following issues:
- HoodieFileIndex isn't listing the table until `listFiles` method is invoked
- Listing would actually be performed only during actual execution in `FileSourceScanExac` node
- Since listing isn't performed until the actual execution, table statistics are initialized w/ bogus values (of 1 byte) and Cost-based Optimizations (CBO) will be taking incorrect decisions based on that
Attachments
Issue Links
- links to