Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5844

Optimize Pipeline.fit for ParamGrid

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.3.0
    • None
    • ML

    Description

      This issue was brought up by prudenko in this JIRA .

      *Proposal*:
      When Pipeline.fit is given an array of ParamMaps, it should operate incrementally:

      • For each set of parameters applicable to the first PipelineStage,
        • Fit/transform that stage using that set of parameters.
        • For each set of parameters applicable to the second PipelineStage,
          • etc.

      This is essentially a depth-first search on the parameters, where each node/level in the search tree is a PipelineStage and each node's child nodes correspond to the set of ParamMaps for that PipelineStage.

      This will avoid recomputing intermediate RDDs during model search.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              josephkb Joseph K. Bradley
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: