Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Many machine learning pipeline has the API for easily assembling transformers.
One example would be:
val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).
Overall, the feature would
1. Allow people (especially starters) to create a ML application in one simple line of code.
2. And can be handy for users as they don't have to set the input, output columns.
3. Thinking further, we may not need code any longer to build a Spark ML application as it can be done by configuration:
"ml.pipeline.input": "hdfs://path.svm" "ml.pipeline": "tokenizer", "hashingTF", "lda" "ml.tokenizer.toLowercase": "false" ...
, which can be quite efficient for tuning on cluster.
Appreciate feedback and suggestions.