Description
SparkR mllib.R is getting bigger as we add more ML wrappers, I'd like to split it into multiple files to make us easy to maintain:
- mllibClassification.R
- mllibRegression.R
- mllibClustering.R
- mllibFeature.R
or:
- mllib/classification.R
- mllib/regression.R
- mllib/clustering.R
- mllib/features.R
For R convention, it's more prefer the first way. And I'm not sure whether R supports the second organized way (will check later). Please let me know your preference. I think the start of a new release cycle is a good opportunity to do this, since it will involves less conflicts. If this proposal was approved, I can work on it.