Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3594

Support standard Spark functions in Filter Exprs in Data Skipping

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.11.0
    • None

    Description

      As part of this effort we're planning to (at the very least) support a suite of standard Spark functions when evaluating Data Filtering expressions w/in Data Skipping flow, for ex: when user is issuing a following query 

       

      SELECT ... WHERE date_format(ts, 'dd-mm-yyyy') > '01-01-2022'
      

      We're able to relate such query to our Column Stats Index appropriately, therefore being able to do Data Skipping not only on the "raw" columns, but also upon simple derivative expressions on top of them (like standard function calls){}

       

      Important to note here, is that only transformations that preserve the ordering of the source column can be applied. Transformations not preserving the ordering will render Column Stats index practically irrelevant (since no assumption could be made that values in the column derived by such transformations are ordered)

      Attachments

        Issue Links

          Activity

            People

              alexey.kudinkin Alexey Kudinkin
              alexey.kudinkin Alexey Kudinkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: