Description
This is a first step of the parent task of Optimizations for ML Pipeline Tuning to perform model evaluation in parallel. A simple approach is to naively evaluate with a possible parameter to control the level of parallelism. There are some concerns with this:
- excessive caching of datasets
- what to set as the default value for level of parallelism. 1 will evaluate all models in serial, as is done currently. Higher values could lead to excessive caching.
Attachments
Attachments
Issue Links
- is related to
-
SPARK-21911 Parallel Model Evaluation for ML Tuning: PySpark
- Resolved
-
SPARK-22126 Fix model-specific optimization support for ML tuning
- Resolved
- relates to
-
SPARK-21027 Parallel One vs. Rest Classifier
- Resolved
-
SPARK-19979 [MLLIB] Multiple Estimators/Pipelines In CrossValidator
- Resolved
- links to