Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.0
Description
Standard Deviation (SD) is used for dataset normalization, which is useful in the training process of Lasso, etc. Current implementation of SD is using the second-order expectations equation E^2( x )-E(x^2), which is not a stable algorithm facing with floating point computing.
Instead of that, the first-order equation performs better.
Moreover, MLutils is not a right place to hold standard statistics methods, It is more suitable that put it in the VectorRDDFunctions. Some other affected machine learning algorithms should also be refined.