[SPARK-36559] Allow column pruning on distributed sequence index (pandas API on Spark) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.2.0
Component/s: PySpark, SQL
Labels:
None

Description

https://issues.apache.org/jira/browse/SPARK-36338 implemented distributed sequence implementation on JVM side. However, it disables leveraging Spark SQL engine because it creates an RDD directly, and truncate the SQL plans.

We should move the logic into a proper SQL plan to leverage other optimizations such as column pruning.

Attachments

Issue Links

relates to

SPARK-34849 SPIP: Support pandas API layer on PySpark

Resolved

links to

[Github] Pull Request #33807 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 23/Aug/21 06:18

Updated:: 12/Dec/22 18:10

Resolved:: 25/Aug/21 01:03