Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43789

Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R createDataFrame with Arrow by default

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 3.5.0
    • SparkR
    • None

    Description

      Now, createDataFrame uses `1` for numPartitions by default, which isn't realistic. Should use larger number for default partitions.

      In PySpark, we chunk the input data by 'spark.sql.execution.arrow.maxRecordsPerBatch' size. Should better follow that in SparkR.

      Attachments

        Activity

          gurwls223 Hyukjin Kwon added a comment -

          Issue resolved by pull request 41307
          https://github.com/apache/spark/pull/41307

          gurwls223 Hyukjin Kwon added a comment - Issue resolved by pull request 41307 https://github.com/apache/spark/pull/41307

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: