Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26759 Arrow optimization in SparkR's interoperability
  3. SPARK-26858

Vectorized gapplyCollect, Arrow optimization in native R function execution

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Later
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: SparkR, SQL
    • Labels:
      None

      Description

      Unlike gapply, gapplyCollect requires additional ser/de steps because it can omit the schema, and Spark SQL doesn't know the return type before actually execution happens.

      In original code path, it's done via using binary schema. Once gapply is done (SPARK-26761). we can mimic this approach in vectorized gapply to support gapplyCollect.

        Attachments

          Activity

            People

            • Assignee:
              hyukjin.kwon Hyukjin Kwon
              Reporter:
              hyukjin.kwon Hyukjin Kwon
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: