Description
Currently there are some performance issues in the pandas-on-Spark layer.
One of them is accessing Java DataFrame and run analysis phase too many times, especially just for retrieving the current column names or data types.
We should reduce the amount of unnecessary access.
Attachments
Issue Links
- relates to
-
SPARK-34849 SPIP: Support pandas API layer on PySpark
- Resolved
- links to