Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28198

Add mapPartitionsInPandas to allow an iterator of DataFrames

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      SPARK-26412 added a new type of Pandas UDF called Scalar Iter. It should be good to use this whtout the limitation of length.

      This JIRA targets to add mapPartitionsInPandas that leverages this Pandas UDF and Arrow / Pandas integration in Spark.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hyukjin.kwon Hyukjin Kwon
                Reporter:
                hyukjin.kwon Hyukjin Kwon
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: