Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28478

Optimizer rule to remove unnecessary explicit null checks for null-intolerant expressions (e.g. if(x is null, x, f(x)))

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      I ran across a family of expressions like

      if(x is null, x, substring(x, 0, 1024))

      or 

      when($"x".isNull, $"x", substring($"x", 0, 1024))

      that were written this way because the query author was unsure about whether substring would return null when its input string argument is null.

      This explicit null-handling is unnecessary and adds bloat to the generated code, especially if it's done via a CASE statement (which compiles down to a do-while loop).

      In another case I saw a query compiler which automatically generated this type of code.

      It would be cool if Spark could automatically optimize such queries to remove these redundant null checks. Here's a sketch of what such a rule might look like (assuming that SPARK-28477 has been implement so we only need to worry about the IF case):

      • In the pattern match, check the following three conditions in the following order (to benefit from short-circuiting)
        • The IF condition is an explicit null-check of a column c
        • The true expression returns either c or null
        • The false expression is a null-intolerant expression with c as a direct child. 
      • If this condition matches, replace the entire If with the false branch's expression..

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: