Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27692

Optimize evaluation of udf that is deterministic and has literal inputs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      Deterministic UDF is a udf for which the following is true:  Given a specific input, the output of the udf will be the same no matter how many times you execute the udf.

      When your inputs to the UDF are all literal and UDF is deterministic, we can optimize this to evaluate the udf once and use the output instead of evaluating the UDF each time for every row in the query. 

      This is valid only if the UDF is deterministic and inputs are literal.  Otherwise we should not and cannot apply this optimization. 

      Testing: 

      We have used this internally and have seen significant performance improvements for some very expensive UDFs ( as expected).

      In the PR, I have added unit tests. 

      Credits: 

      Thanks to Guy Khazma(https://github.com/guykhazma) from the IBM Haifa Research Team for the idea and the original implementation. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ksunitha Sunitha Kambhampati
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: