Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27052

Using PySpark udf in transform yields NULL values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.0
    • None
    • PySpark, SQL

    Description

      Steps to reproduce

      
      from typing import Optional
      from pyspark.sql.functions import expr
      
      def f(x: Optional[int]) -> Optional[int]:
          return x + 1 if x is not None else None
      
      spark.udf.register('f', f, "integer")
      
      df = (spark
          .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
          .withColumn("xsinc", expr("transform(xs, x -> f(x))")))
      
      df.show()
      
      # +---+---------+-----+
      # | id|       xs|xsinc|
      # +---+---------+-----+
      # |  1|[1, 2, 3]| [,,]|
      # +---+---------+-----+
      
      
      

       

      Source https://stackoverflow.com/a/53762650

      Attachments

        Activity

          People

            Unassigned Unassigned
            hejsgpuom62c hejsgpuom62c
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: