[SPARK-17094] provide simplified API for ML pipeline - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
None

Description

Many machine learning pipeline has the API for easily assembling transformers.

One example would be:

val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).

Overall, the feature would
1. Allow people (especially starters) to create a ML application in one simple line of code.
2. And can be handy for users as they don't have to set the input, output columns.
3. Thinking further, we may not need code any longer to build a Spark ML application as it can be done by configuration:

"ml.pipeline.input": "hdfs://path.svm"
"ml.pipeline": "tokenizer", "hashingTF", "lda"
"ml.tokenizer.toLowercase": "false"
...

, which can be quite efficient for tuning on cluster.

Appreciate feedback and suggestions.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: yuhao yang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 16/Aug/16 21:35

Updated:: 04/Oct/16 09:36

Resolved:: 04/Oct/16 09:36