Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22216 Improving PySpark/Pandas interoperability
  3. SPARK-22239

User-defined window functions with pandas udf (unbounded window)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.4.0
    • PySpark
    • None

    Description

      Window function is another place we can benefit from vectored udf and add another useful function to the pandas_udf suite.

      Example usage (preliminary):

      w = Window.partitionBy('id').rowsBetween(Window.unbounedPreceding, Window.unbounedFollowing)
      
      @pandas_udf(DoubleType())
      def mean_udf(v):
          return v.mean()
      
      df.withColumn('v_mean', mean_udf(df.v1).over(window))
      

      Attachments

        Activity

          People

            icexelloss Li Jin
            icexelloss Li Jin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: