[SPARK-27834] Make separate PySpark/SparkR vectorization configurations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: PySpark, SparkR, SQL
Labels:
None

Target Version/s:

3.0.0

Description

spark.sql.execution.arrow.enabled was added when we add PySpark arrow optimization.
Later, in the current master, SparkR arrow optimization was added and it's controlled by the same configuration spark.sql.execution.arrow.enabled.

There look two issues about this:

1. spark.sql.execution.arrow.enabled in PySpark was added from 2.3.0 whereas SparkR optimization was added 3.0.0. The stability is different so it's problematic when we change the default value for one of both optimization first.

2. Suppose users want to share some JVM by PySpark and SparkR. They are currently forced to use the optimization for all or none.

Attachments

Issue Links

relates to

SPARK-21187 Complete support for remaining Spark data types in Arrow Converters

Resolved

SPARK-26759 Arrow optimization in SparkR's interoperability

Resolved

links to

GitHub Pull Request #24700

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 24/May/19 14:44

Updated:: 12/Dec/22 18:10

Resolved:: 03/Jun/19 01:02