Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28264

Revisiting Python / pandas UDF

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • PySpark, SQL

    Description

      In the past two years, the pandas UDFs are perhaps the most important changes to Spark for Python data science. However, these functionalities have evolved organically, leading to some inconsistencies and confusions among users. This document revisits UDF definition and naming, as a result of discussions among Xiangrui, Li Jin, Hyukjin, and Reynold.

      See document here: https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit#

       New proposal: https://docs.google.com/document/d/1-kV0FS_LF2zvaRh_GhkV32Uqksm_Sq8SvnBBmRyxm30/edit?usp=sharing

      Attachments

        Issue Links

          Activity

            People

              hyukjin.kwon Hyukjin Kwon
              rxin Reynold Xin
              Votes:
              1 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: