Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28477

Rewrite `CASE WHEN cond THEN ifTrue OTHERWISE ifFalse` END into `IF(cond, ifTrue, ifFalse)`

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: Optimizer, SQL
    • Labels:
      None

      Description

      Spark SQL has both CASE WHEN and IF expressions.

      I've seen many cases where end-users write

      when(x, ifTrue).otherwise(ifFalse)

      because Spark doesn't have a org.apache.spark.sql.functions._ method for the If expression.

      Unfortunately, CASE WHEN generates substantial code bloat because its codgen is implemented using a do-while loop. In some performance-critical frameworks, I've modified our code to directly construct the Catalyst If expression, but this is toilsome and confusing to end-users.

      If we have a CASE WHEN which has only two branches, like the example given above, then Spark should automatically rewrite it into a simple IF expression.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vrbad David Vrba
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: