Details
Description
In [1]: df = sqlCtx.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, "2")], ["key", "value"]) In [2]: from pyspark.sql import functions as F In [8]: df.select(df.key, F.when(df.key > 1, 0).when(df.key == 0, 2).otherwise(1)).show() +---+---------------------------------+ | key |CASE WHEN (key = 0) THEN 2 ELSE 1| +---+---------------------------------+ | 1| 1| | 2| 1| | 1| 1| | 1| 1| +---+---------------------------------+
When in Scala I get the expected expression and behaviour :
scala> val df = sqlContext.createDataFrame(List((1, "1"), (2, "2"), (1, "2"), (1, "2"))).toDF("key", "value") scala> import org.apache.spark.sql.functions._ scala> df.select(df("key"), when(df("key") > 1, 0).when(df("key") === 2, 2).otherwise(1)).show() +---+-------------------------------------------------------+ |key|CASE WHEN (key > 1) THEN 0 WHEN (key = 2) THEN 2 ELSE 1| +---+-------------------------------------------------------+ | 1| 1| | 2| 0| | 1| 1| | 1| 1| +---+-------------------------------------------------------+
This is coming from the "column.py" file with the Column class definition of *when* and the fix is coming.