Description
https://issues.apache.org/jira/browse/SPARK-36338 implemented distributed sequence implementation on JVM side. However, it disables leveraging Spark SQL engine because it creates an RDD directly, and truncate the SQL plans.
We should move the logic into a proper SQL plan to leverage other optimizations such as column pruning.
Attachments
Issue Links
- relates to
-
SPARK-34849 SPIP: Support pandas API layer on PySpark
-
- Resolved
-
- links to