Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5874 How to improve the current ML pipeline API?
  3. SPARK-7461

Remove spark.ml Model, and have all Transformers have parent

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • ML
    • None

    Description

      A recent PR https://github.com/apache/spark/pull/5980 brought up an issue with the Model abstraction: There are transformers which could be Transformers (created by a user) or Models (created by an Estimator). This is the first instance, but there will be more such transformers in the future.

      Some possible fixes are:

      • Create 2 separate classes, 1 extending Transformer and 1 extending Model. These would be essentially the same, and they could share code (or have 1 wrap the other). This would bloat the API.
      • Just use Model, with a possibly null parent class. There is precedence (meta-algorithms like RandomForest producing weak hypothesis Models with no parent).
      • Change Transformer to have a parent which may be null.
        • --> Unless there is strong disagreement, I think we should go with this last option.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              josephkb Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: