Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43789

Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R createDataFrame with Arrow by default

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 3.5.0
    • SparkR
    • None

    Description

      Now, createDataFrame uses `1` for numPartitions by default, which isn't realistic. Should use larger number for default partitions.

      In PySpark, we chunk the input data by 'spark.sql.execution.arrow.maxRecordsPerBatch' size. Should better follow that in SparkR.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: