Details
-
Improvement
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
As part of this effort we're planning to (at the very least) support a suite of standard Spark functions when evaluating Data Filtering expressions w/in Data Skipping flow, for ex: when user is issuing a following query
SELECT ... WHERE date_format(ts, 'dd-mm-yyyy') > '01-01-2022'
We're able to relate such query to our Column Stats Index appropriately, therefore being able to do Data Skipping not only on the "raw" columns, but also upon simple derivative expressions on top of them (like standard function calls){}
Important to note here, is that only transformations that preserve the ordering of the source column can be applied. Transformations not preserving the ordering will render Column Stats index practically irrelevant (since no assumption could be made that values in the column derived by such transformations are ordered)
Attachments
Issue Links
- relates to
-
HUDI-512 Support Logical Partitioning with Expression Index
- Closed
- links to