Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8038

PySpark SQL when functions is broken on Column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.4.0
    • 1.4.0
    • PySpark, SQL
    • None
    • Spark 1.4.0 RC3

    Description

      In [1]: df = sqlCtx.createDataFrame([(1, "1"), (2, "2"), (1, "2"), (1, "2")], ["key", "value"])
      
      
      In [2]: from pyspark.sql import functions as F
      
      In [8]: df.select(df.key, F.when(df.key > 1, 0).when(df.key == 0, 2).otherwise(1)).show()
      
      +---+---------------------------------+
      | key |CASE WHEN (key = 0) THEN 2 ELSE 1|
      +---+---------------------------------+
      | 1| 1|
      | 2| 1|
      | 1| 1|
      | 1| 1|
      +---+---------------------------------+
      

      When in Scala I get the expected expression and behaviour :

      scala> val df = sqlContext.createDataFrame(List((1, "1"), (2, "2"), (1, "2"), (1, "2"))).toDF("key", "value")
      
      scala> import org.apache.spark.sql.functions._
      
      scala> df.select(df("key"), when(df("key") > 1, 0).when(df("key") === 2, 2).otherwise(1)).show()
      
      +---+-------------------------------------------------------+
      
      |key|CASE WHEN (key > 1) THEN 0 WHEN (key = 2) THEN 2 ELSE 1|
      +---+-------------------------------------------------------+
      | 1| 1|
      | 2| 0|
      | 1| 1|
      | 1| 1|
      +---+-------------------------------------------------------+
      

      This is coming from the "column.py" file with the Column class definition of *when* and the fix is coming.

      Attachments

        Activity

          People

            ogirardot Olivier Girardot
            ogirardot Olivier Girardot
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: