josephkb After looking at source code of org.apache.spark.ml.classification.LogisticRegressionSummary and org.apache.spark.ml.classification.LogisticRegressionTrainingSummary
and after running a sample GLM in R which has the following output
Call:
glm(formula = mpg ~ wt + hp + gear, family = gaussian(), data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.3712 -1.9017 -0.3444 0.9883 6.0655
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.013657 4.632264 6.911 1.64e-07 ***
wt -3.197811 0.846546 -3.777 0.000761 ***
hp -0.036786 0.009891 -3.719 0.000888 ***
gear 1.019981 0.851408 1.198 0.240963
—
Signif. codes: 0 ‘**’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 6.626347)
Null deviance: 1126.05 on 31 degrees of freedom
Residual deviance: 185.54 on 28 degrees of freedom
AIC: 157.05
Number of Fisher Scoring iterations: 2
I have the following comments :
1-I think we should add the following member to LogisticRegressionSummary : coefficients and residuals
2-toString method should be overridden in the following classes :
org.apache.spark.ml.classification.BinaryLogisticRegressionSummary and org.apache.spark.ml.classification.BinaryLogisticRegressionTrainingSummary
Any other suggestions ? Please correct me if have missed something.
It sounds reasonable to provide the same printed summary in Scala, Java, and Python as in R. Perhaps it can be provided as a toString method for the LogisticRegressionModel.summary member?