This is a bug that appears while fitting a Logistic Regression model with `.setStandardization(false)` and `setFitIntercept(false)`. If the data matrix has one column with identical value, the resulting model is not correct. Specifically, the special column will always get a weight of 0, due to the special check inside the code. However, the correct solution, which is unique for L2 logistic regression, usually has non-zero weight.
I use the heart_scale data (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html) and manually augmented the data matrix with a column of one (available in the PR). The resulting data is run with reg=1.0, max_iter=1000, tol=1e-9 on the following tools:
(Notice libsvm and scikit-learn use a slightly different formulation, so their regularizer is equivalently set to 1/270).
The first two will have an objective value 0.7275 and give a solution vector:
[0.03007516959304916, 0.09054186091216457, 0.09540306114820495, 0.02436266296315414, 0.01739437315700921, -0.0006404006623321454
0.06367837291956932, -0.0589096636263823, 0.1382458934368336, 0.06653302996539669, 0.07988499067852513, 0.1197789052423401, 0.1801661775839843, -0.01248615347419409].
Spark will produce an objective value 0.7278 and give a solution vector:
Notice the last element of the weight vector is 0.
A even simpler example is:
The same data trained by the current solver also gives a different result, see the unit test in the PR.