Description
1, Some plots depends on MLlib and don't work with Spark Conenct. We need to reimplement ML-based plots with Spark SQL, so they can be compatible with Spark Connect;
2, Further computation optimization, e.g.:
- compute all necessary metrics for some plots in single pass of the whole dataset, so we can improve the performance;
- optimize existing sampling algorithm
Attachments
Issue Links
- relates to
-
SPARK-49530 Introducing PySpark Plotting API
- Resolved