Nice design doc! I had some experiences on the parameter part. It would be great to have Constraints on the individual parameters and on the Params level. For example, learning_rate must be greater than 0, and regularization can be one of "l1", "l2". Parameter check is something that every learning algorithm does, so some support at parameter definition time would make code more concise.
abstract class ParamConstraint[T] extends Serializable
def isValid(value: T): Boolean
def invalidMessage(value: T): String
class IntRangeConstraint(min: Int, max: Int) extends ParamConstraint[Int]
def isValid(value: Int) = value >= min && value <= max
def invalidMessage(value: T) = "..."
class Param[T] (..., constraints: List[ParamConstraint[T]] = List())
// constraints is a list because there might be more than one type of constraints applied to this Param
at definition time, we can write:
val maxIter: Param[Int] = new Param(id, “maxIter”, “max number of iterations”, 100, List(new IntRangeConstraint(1, 500)))
There shouldn't be too many types of constraints, so ml could provide a list of commonly used constraint classes. Keeping parameter definition and constraint in the same line also improves readability. Params trait could use similar structure to check constraints on multiple parameters, but this is less likely to happen in real use cases. In the end, validateParams of Params just call isValid of all member Param as default implementation.