[SPARK-7610] Design clustering abstractions for Pipelines API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Brainstorming
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

We will soon start adding clustering algorithms to the Pipelines API. We should discuss what abstractions we should define for clustering algorithms:

Are there standard APIs for clustering?
What clustering algorithms might not fit in the normal APIs?
How can the APIs benefit (a) users and (b) developers?

W.r.t. benefitting users and developers, I'm envisioning:

traits for users
- These should help standardize the API, but we'll have to plan ahead carefully.
abstract classes with some boilerplate code pre-implemented for developers (similar to the current Prediction developer APIs)
- These will be available if helpful, but developers should not need to use them.

Attachments

Issue Links

Is contained by

SPARK-10817 ML abstraction umbrella

Resolved

relates to

SPARK-7879 KMeans API for spark.ml Pipelines

Resolved

SPARK-5565 LDA wrapper for spark.ml package

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 13/May/15 18:50

Updated:: 21/May/19 04:33

Resolved:: 21/May/19 04:33