Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14084

Parallel training jobs in model selection

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.0
    • None
    • ML
    • None

    Description

      In CrossValidator and TrainValidationSplit, we run training jobs one by one. If users have a big cluster, they might see speed-ups if we parallelize the job submission on the driver. The trade-off is that we might need to make multiple copies of the training data, which could be expensive. It is worth testing and figure out the best way to implement it.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mengxr Xiangrui Meng
              Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: