Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5874 How to improve the current ML pipeline API?
  3. SPARK-7461

Remove spark.ml Model, and have all Transformers have parent

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: ML
    • Labels:
      None

      Description

      A recent PR https://github.com/apache/spark/pull/5980 brought up an issue with the Model abstraction: There are transformers which could be Transformers (created by a user) or Models (created by an Estimator). This is the first instance, but there will be more such transformers in the future.

      Some possible fixes are:

      • Create 2 separate classes, 1 extending Transformer and 1 extending Model. These would be essentially the same, and they could share code (or have 1 wrap the other). This would bloat the API.
      • Just use Model, with a possibly null parent class. There is precedence (meta-algorithms like RandomForest producing weak hypothesis Models with no parent).
      • Change Transformer to have a parent which may be null.
        • --> Unless there is strong disagreement, I think we should go with this last option.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              josephkb Joseph K. Bradley

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment