[SPARK-23109] ML 2.3 QA: API: Python API coverage - ASF JIRA

XML

Word

Printable

JSON

For new public APIs added to MLlib (spark.ml only), we need to check the generated HTML doc and compare the Scala & Python versions.

We need to track:

Inconsistency: Do class/method/parameter names match?
Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc.
API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release.
- Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well.
Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle. Please use a separate JIRA (linked below as "requires") for this list of to-do items.

requires

SPARK-22005 CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Python API

SPARK-22796 Add multiple column support to PySpark QuantileDiscretizer

SPARK-22797 Add multiple column support to PySpark Bucketizer

SPARK-21741 Python API for DataFrame-based multivariate summarizer

SPARK-23161 Add missing APIs to Python GBTClassifier

SPARK-23162 PySpark ML LinearRegressionSummary missing r2adj

SPARK-23256 Add columnSchema method to PySpark image reader

SPARK-23163 Sync Python ML API docs with Scala

(3 requires)