Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.0.0
-
None
-
None
Description
In CrossValidator and TrainValidationSplit, we run training jobs one by one. If users have a big cluster, they might see speed-ups if we parallelize the job submission on the driver. The trade-off is that we might need to make multiple copies of the training data, which could be expensive. It is worth testing and figure out the best way to implement it.
Attachments
Issue Links
- is superceded by
-
SPARK-19071 Optimizations for ML Pipeline Tuning
- Resolved