Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28477

Rewrite `CASE WHEN cond THEN ifTrue OTHERWISE ifFalse` END into `IF(cond, ifTrue, ifFalse)`

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Optimizer, SQL
    • None

    Description

      Spark SQL has both CASE WHEN and IF expressions.

      I've seen many cases where end-users write

      when(x, ifTrue).otherwise(ifFalse)

      because Spark doesn't have a org.apache.spark.sql.functions._ method for the If expression.

      Unfortunately, CASE WHEN generates substantial code bloat because its codgen is implemented using a do-while loop. In some performance-critical frameworks, I've modified our code to directly construct the Catalyst If expression, but this is toilsome and confusing to end-users.

      If we have a CASE WHEN which has only two branches, like the example given above, then Spark should automatically rewrite it into a simple IF expression.

      Attachments

        Issue Links

          Activity

            People

              vrbad David Vrba
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: