Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7610

Design clustering abstractions for Pipelines API

    XMLWordPrintableJSON

Details

    • Brainstorming
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • ML

    Description

      We will soon start adding clustering algorithms to the Pipelines API. We should discuss what abstractions we should define for clustering algorithms:

      • Are there standard APIs for clustering?
      • What clustering algorithms might not fit in the normal APIs?
      • How can the APIs benefit (a) users and (b) developers?

      W.r.t. benefitting users and developers, I'm envisioning:

      • traits for users
        • These should help standardize the API, but we'll have to plan ahead carefully.
      • abstract classes with some boilerplate code pre-implemented for developers (similar to the current Prediction developer APIs)
        • These will be available if helpful, but developers should not need to use them.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              josephkb Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: