Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26759 Arrow optimization in SparkR's interoperability
  3. SPARK-26858

Vectorized gapplyCollect, Arrow optimization in native R function execution

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Later
    • 3.0.0
    • None
    • SparkR, SQL
    • None

    Description

      Unlike gapply, gapplyCollect requires additional ser/de steps because it can omit the schema, and Spark SQL doesn't know the return type before actually execution happens.

      In original code path, it's done via using binary schema. Once gapply is done (SPARK-26761). we can mimic this approach in vectorized gapply to support gapplyCollect.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: