Details
-
Brainstorming
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
We will soon start adding clustering algorithms to the Pipelines API. We should discuss what abstractions we should define for clustering algorithms:
- Are there standard APIs for clustering?
- What clustering algorithms might not fit in the normal APIs?
- How can the APIs benefit (a) users and (b) developers?
W.r.t. benefitting users and developers, I'm envisioning:
- traits for users
- These should help standardize the API, but we'll have to plan ahead carefully.
- abstract classes with some boilerplate code pre-implemented for developers (similar to the current Prediction developer APIs)
- These will be available if helpful, but developers should not need to use them.
Attachments
Issue Links
- Is contained by
-
SPARK-10817 ML abstraction umbrella
- Resolved
- relates to
-
SPARK-7879 KMeans API for spark.ml Pipelines
- Resolved
-
SPARK-5565 LDA wrapper for spark.ml package
- Resolved