Description
API
- Check API compliance using java-compliance-checker (
SPARK-7458)
- Audit new public APIs (from the generated html doc)
- Scala (do not forget to check the object doc) (
SPARK-7537) - Java compatibility (
SPARK-7529) - Python API coverage (
SPARK-7536)
- Scala (do not forget to check the object doc) (
- audit Pipeline APIs (
SPARK-7535)
- graduate spark.ml from alpha (
SPARK-7748)- remove AlphaComponent annotations
- remove mima excludes for spark.ml
- mark concrete classes final wherever reasonable
Algorithms and performance
Performance
- List any other missing performance tests from spark-perf here
- LDA online/EM (
SPARK-7455) - ElasticNet for linear regression and logistic regression (
SPARK-7456) - Bernoulli naive Bayes (
SPARK-7453) - PIC (
SPARK-7454) - ALS.recommendAll (
SPARK-7457) - perf-tests in Python (
SPARK-7539)
Correctness
- PMML
- scoring using PMML evaluator vs. MLlib models (
SPARK-7540)
- scoring using PMML evaluator vs. MLlib models (
- model save/load (
SPARK-7541)
Documentation and example code
- Create JIRAs for the user guide to each new algorithm and assign them to the corresponding author. Link here as "requires"
- Now that we have algorithms in spark.ml which are not in spark.mllib, we should start making subsections for the spark.ml API as needed. We can follow the structure of the spark.mllib user guide.
- The spark.ml user guide can provide: (a) code examples and (b) info on algorithms which do not exist in spark.mllib.
- We should not duplicate info in the spark.ml guides. Since spark.mllib is still the primary API, we should provide links to the corresponding algorithms in the spark.mllib user guide for more info.
- Now that we have algorithms in spark.ml which are not in spark.mllib, we should start making subsections for the spark.ml API as needed. We can follow the structure of the spark.mllib user guide.
- Create example code for major components. Link here as "requires"
- cross validation in python (
SPARK-7387) - pipeline with complex feature transformations (scala/java/python) (
SPARK-7546) - elastic-net (possibly with cross validation) (
SPARK-7547) - kernel density (
SPARK-7707)
- cross validation in python (
- Update Programming Guide for 1.4 (towards end of QA) (
SPARK-7715)
Attachments
Issue Links
- requires
-
SPARK-7546 Example code for ML Pipelines feature transformations
- Resolved
-
SPARK-7547 Example code for ElasticNet
- Resolved
-
SPARK-7387 CrossValidator example code in Python
- Resolved
-
SPARK-6013 Add more Python ML examples for spark.ml
- Resolved
-
SPARK-7272 User guide update for PMML model export
- Resolved
-
SPARK-7555 User guide update for ElasticNet
- Resolved
-
SPARK-7556 User guide update for feature transformer: Binarizer
- Resolved
-
SPARK-7557 User guide update for feature transformer: HashingTF, Tokenizer
- Resolved
-
SPARK-7574 User guide update for OneVsRest
- Resolved
-
SPARK-7575 Example code for OneVsRest
- Resolved
-
SPARK-7576 User guide update for spark.ml ElementwiseProduct
- Resolved
-
SPARK-7577 User guide update for Bucketizer
- Resolved
-
SPARK-7578 User guide update for spark.ml IDF, Normalizer, StandardScaler
- Resolved
-
SPARK-7579 User guide update for OneHotEncoder
- Resolved
-
SPARK-7581 User guide update for PolynomialExpansion
- Resolved
-
SPARK-7582 User guide update for StringIndexer
- Resolved
-
SPARK-7583 User guide update for RegexTokenizer
- Resolved
-
SPARK-7584 User guide update for VectorAssembler
- Resolved
-
SPARK-7585 User guide update for VectorIndexer
- Resolved
-
SPARK-7586 User guide update for spark.ml Word2Vec
- Resolved
-
SPARK-7707 User guide and example code for KernelDensity
- Resolved
-
SPARK-7772 User guide for spark.ml trees and ensembles
- Resolved
-
SPARK-7459 Add Java example for ElementwiseProduct in programming guide
- Resolved
-
SPARK-7496 User guide update for Online LDA
- Closed