Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27834

Make separate PySpark/SparkR vectorization configurations

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: PySpark, SparkR, SQL
    • Labels:
      None
    • Target Version/s:

      Description

      spark.sql.execution.arrow.enabled was added when we add PySpark arrow optimization.
      Later, in the current master, SparkR arrow optimization was added and it's controlled by the same configuration spark.sql.execution.arrow.enabled.

      There look two issues about this:

      1. spark.sql.execution.arrow.enabled in PySpark was added from 2.3.0 whereas SparkR optimization was added 3.0.0. The stability is different so it's problematic when we change the default value for one of both optimization first.

      2. Suppose users want to share some JVM by PySpark and SparkR. They are currently forced to use the optimization for all or none.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hyukjin.kwon Hyukjin Kwon
                Reporter:
                hyukjin.kwon Hyukjin Kwon
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: