Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27052

Using PySpark udf in transform yields NULL values

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: PySpark, SQL
    • Labels:

      Description

      Steps to reproduce

      
      from typing import Optional
      from pyspark.sql.functions import expr
      
      def f(x: Optional[int]) -> Optional[int]:
          return x + 1 if x is not None else None
      
      spark.udf.register('f', f, "integer")
      
      df = (spark
          .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
          .withColumn("xsinc", expr("transform(xs, x -> f(x))")))
      
      df.show()
      
      # +---+---------+-----+
      # | id|       xs|xsinc|
      # +---+---------+-----+
      # |  1|[1, 2, 3]| [,,]|
      # +---+---------+-----+
      
      
      

       

      Source https://stackoverflow.com/a/53762650

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              hejsgpuom62c hejsgpuom62c
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: