Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27834

Make separate PySpark/SparkR vectorization configurations

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • PySpark, SparkR, SQL
    • None

    Description

      spark.sql.execution.arrow.enabled was added when we add PySpark arrow optimization.
      Later, in the current master, SparkR arrow optimization was added and it's controlled by the same configuration spark.sql.execution.arrow.enabled.

      There look two issues about this:

      1. spark.sql.execution.arrow.enabled in PySpark was added from 2.3.0 whereas SparkR optimization was added 3.0.0. The stability is different so it's problematic when we change the default value for one of both optimization first.

      2. Suppose users want to share some JVM by PySpark and SparkR. They are currently forced to use the optimization for all or none.

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: