Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Done
-
None
-
None
Description
For new public APIs added to MLlib, we need to check the generated HTML doc and compare the Scala & Python versions. We need to track:
- Inconsistency: Do class/method/parameter names match?
- Docs: Is the Python doc missing or just a stub? We want the Python doc to be as complete as the Scala doc.
- API breaking changes: These should be very rare but are occasionally either necessary (intentional) or accidental. These must be recorded and added in the Migration Guide for this release.
- Note: If the API change is for an Alpha/Experimental/DeveloperApi component, please note that as well.
- Missing classes/methods/parameters: We should create to-do JIRAs for functionality missing from Python, to be added in the next release cycle. Please use a separate JIRA (linked below as "requires") for this list of to-do items.
- NOTE: These missing features should be added in the next release. This work is just to generate a list of to-do items for the future.
UPDATE: This only needs to cover spark.ml since spark.mllib is going into maintenance mode.
Attachments
Issue Links
- contains
-
SPARK-15623 2.0 python coverage ml.feature
- Resolved
-
SPARK-15628 pyspark.ml.evaluation module
- Resolved
-
SPARK-15630 2.0 python coverage ml root module
- Closed
- is cloned by
-
SPARK-16486 Python API parity issues from 2.0 QA
- Resolved
- requires
-
SPARK-11938 Expose numFeatures in all ML PredictionModel for PySpark
- Resolved
-
SPARK-15316 PySpark GeneralizedLinearRegression missing linkPredictionCol param
- Resolved
-
SPARK-8516 ML attribute API in PySpark
- Resolved
-
SPARK-15181 Python API for Generalized Linear Regression Summary
- Resolved
-
SPARK-14894 Python GaussianMixture summary
- Resolved
-
SPARK-14978 PySpark TrainValidationSplitModel should support validationMetrics
- Resolved
-
SPARK-15402 PySpark ml.evaluation should support save/load
- Resolved
-
SPARK-15113 Add missing numFeatures & numClasses to wrapped JavaClassificationModel
- Resolved
-
SPARK-15130 PySpark shared params should include default values to match Scala
- Resolved
-
SPARK-15136 Linkify ML PyDoc
- Resolved
-
SPARK-15139 PySpark TreeEnsemble missing methods
- Resolved
-
SPARK-15194 Add Python ML API for MultivariateGaussian
- Resolved
-
SPARK-15442 PySpark QuantileDiscretizer missing "relativeError" param
- Resolved
-
SPARK-15106 Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.
- Resolved
-
SPARK-15162 Update PySpark LogisticRegression threshold PyDoc to be as complete as Scaladoc
- Resolved
-
SPARK-15163 Mark experimental algorithms experimental in PySpark
- Resolved
-
SPARK-15168 Add missing params to Python's MultilayerPerceptronClassifier
- Resolved
-
SPARK-15188 PySpark NaiveBayes is missing Thresholds param
- Resolved
-
SPARK-15189 ml.Evaluation pydoc issues
- Resolved
-
SPARK-15195 Improve PyDoc for ml.tuning
- Resolved
-
SPARK-15281 PySpark ML GBTRegressor lacks impurity param
- Resolved
-
SPARK-15412 Improve linear & isotonic regression methods PyDocs
- Resolved
-
SPARK-15788 PySpark IDFModel missing "idf" property
- Resolved
-
SPARK-15500 Remove defaults in storage level param doc in ALS
- Resolved