[SPARK-14084] Parallel training jobs in model selection - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: ML
Labels:
None

Description

In CrossValidator and TrainValidationSplit, we run training jobs one by one. If users have a big cluster, they might see speed-ups if we parallelize the job submission on the driver. The trade-off is that we might need to make multiple copies of the training data, which could be expensive. It is worth testing and figure out the best way to implement it.

Attachments

Issue Links

is superceded by

SPARK-19071 Optimizations for ML Pipeline Tuning

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Xiangrui Meng

Votes:: 2 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 22/Mar/16 22:44

Updated:: 24/Feb/17 07:22

Resolved:: 24/Feb/17 07:22