Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22216 Improving PySpark/Pandas interoperability
  3. SPARK-22980

Using pandas_udf when inputs are not Pandas's Series or DataFrame

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • PySpark
    • None

    Description

      from pyspark.sql.functions import pandas_udf
      from pyspark.sql.functions import col, lit
      from pyspark.sql.types import LongType
      df = spark.range(3)
      f = pandas_udf(lambda x, y: len(x) + y, LongType())
      df.select(f(lit('text'), col('id'))).show()
      
      from pyspark.sql.functions import udf
      from pyspark.sql.functions import col, lit
      from pyspark.sql.types import LongType
      df = spark.range(3)
      f = udf(lambda x, y: len(x) + y, LongType())
      df.select(f(lit('text'), col('id'))).show()
      

      The results of pandas_udf are different from udf.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            smilegator Xiao Li
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: