Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28477

Rewrite `CASE WHEN cond THEN ifTrue OTHERWISE ifFalse` END into `IF(cond, ifTrue, ifFalse)`

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Optimizer, SQL
    • None

    Description

      Spark SQL has both CASE WHEN and IF expressions.

      I've seen many cases where end-users write

      when(x, ifTrue).otherwise(ifFalse)

      because Spark doesn't have a org.apache.spark.sql.functions._ method for the If expression.

      Unfortunately, CASE WHEN generates substantial code bloat because its codgen is implemented using a do-while loop. In some performance-critical frameworks, I've modified our code to directly construct the Catalyst If expression, but this is toilsome and confusing to end-users.

      If we have a CASE WHEN which has only two branches, like the example given above, then Spark should automatically rewrite it into a simple IF expression.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vrbad David Vrba
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment