While reviewing the documentation of MLlib, I found some additional issues.
Important issues that affect the binary signatures:
- GBTClassificationModel: all the setters should be overriden
- LogisticRegressionModel: setThreshold(s)
- RandomForestClassificationModel: all the setters should be overriden
- org.apache.spark.ml.stat.distribution.MultivariateGaussian is exposed but most of the methods are private[ml] -> do we need to expose this class for now?
- GeneralizedLinearRegressionModel: linkObj, familyObj, familyAndLink should not be exposed
- sqlDataTypes: name does not follow conventions. Do we need to expose it?
Issues that involve only documentation:
1. inconsistent doc between evaluate and isLargerBetter
- MinMaxScaler: math rendering
- GeneralizedLinearRegressionSummary: aic doc is incorrect
The reference documentation that was used was: