Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29627

array_contains should allow column instances in PySpark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • PySpark, SQL
    • None

    Description

      Scala API works well with column instances:

      import org.apache.spark.sql.functions._
      val df = Seq(Array("a", "b", "c"), Array.empty[String]).toDF("data")
      df.select(array_contains($"data", lit("a"))).collect()
      
      Array[org.apache.spark.sql.Row] = Array([true], [false])
      

      However, seems PySpark one doesn't:

      from pyspark.sql.functions import array_contains, lit
      df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
      df.select(array_contains(df.data, lit("a"))).show()
      
      Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "/.../spark/python/pyspark/sql/functions.py", line 1950, in array_contains
       return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))
       File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1277, in __call__
       File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1241, in _build_args
       File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1228, in _get_args
       File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 500, in convert
       File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__
       raise TypeError("Column is not iterable")
      TypeError: Column is not iterable
      

      We should let it allow

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: