Status: In Progress
Affects Version/s: 2.3.0
Fix Version/s: None
Fix model-specific optimization support for ML tuning. This is discussed in
more discussion is here
Anyone who's following might want to scan the design doc (in the links above), the latest api proposal is:
I copy discussion from gist to here:
I propose to design API as:
Let me use an example to explain the API:
It could be possible to still use the current parallelism and still allow for model-specific optimizations. For example, if we doing cross validation and have a param map with regParam = (0.1, 0.3) and maxIter = (5, 10). Lets say that the cross validator could know that maxIter is optimized for the model being evaluated (e.g. a new method in Estimator that return such params). It would then be straightforward for the cross validator to remove maxIter from the param map that will be parallelized over and use it to create 2 arrays of paramMaps: ((regParam=0.1, maxIter=5), (regParam=0.1, maxIter=10)) and ((regParam=0.3, maxIter=5), (regParam=0.3, maxIter=10)).
In this example, we can see that, models computed from ((regParam=0.1, maxIter=5), (regParam=0.1, maxIter=10)) can only be computed in one thread code, models computed from ((regParam=0.3, maxIter=5), (regParam=0.3, maxIter=10)) in another thread. In this example, there're 4 paramMaps, but we can at most generate two threads to compute the models for them.
The API above allow "callable.call()" to return multiple models, and return type is
, key is integer, used to mark the paramMap index for corresponding model. Use the example above, there're 4 paramMaps, but only return 2 callable objects, one callable object for ((regParam=0.1, maxIter=5), (regParam=0.1, maxIter=10)), another one for ((regParam=0.3, maxIter=5), (regParam=0.3, maxIter=10)).
and the default "fitCallables/fit with paramMaps" can be implemented as following:
If use the API I proposed above, the code in CrossValidation
can be changed to: