# Fix wrong AIC calculation in Binomial GLM

XMLWordPrintableJSON

#### Details

• Bug
• Status: Resolved
• Major
• Resolution: Fixed
• 2.0.2
• Important

#### Description

The AIC calculation in Binomial GLM seems to be wrong when there are weights. The result is different from that in R.

The current implementation is:

```      -2.0 * predictions.map { case (y: Double, mu: Double, weight: Double) =>
weight * dist.Binomial(1, mu).logProbabilityOf(math.round(y).toInt)
}.sum()
```

Suggest changing this to

```      -2.0 * predictions.map { case (y: Double, mu: Double, weight: Double) =>
val wt = math.round(weight).toInt
if (wt == 0){
0.0
} else {
dist.Binomial(wt, mu).logProbabilityOf(math.round(y * weight).toInt)
}
}.sum()
```

The following is an example to illustrate the problem.

```val dataset = Seq(
LabeledPoint(0.0, Vectors.dense(18, 1.0)),
LabeledPoint(0.5, Vectors.dense(12, 0.0)),
LabeledPoint(1.0, Vectors.dense(15, 0.0)),
LabeledPoint(0.0, Vectors.dense(13, 2.0)),
LabeledPoint(0.0, Vectors.dense(15, 1.0)),
LabeledPoint(0.5, Vectors.dense(16, 1.0))
).toDF().withColumn("weight", col("label") + 1.0)
val glr = new GeneralizedLinearRegression()
.setFamily("binomial")
.setWeightCol("weight")
.setRegParam(0)
val model = glr.fit(dataset)
model.summary.aic
```

This calculation shows the AIC is 14.189026847171382. To verify whether this is correct, I run the same analysis in R but got AIC = 11.66092, -2 * LogLik = 5.660918.

```da <- scan(, what=list(y = 0, x1 = 0, x2 = 0, w = 0), sep = ",")
0,18,1,1
0.5,12,0,1.5
1,15,0,2
0,13,2,1
0,15,1,1
0.5,16,1,1.5
da <- as.data.frame(da)
f <- glm(y ~ x1 + x2 , data = da, family = binomial(), weight = w)
AIC(f)
-2 * logLik(f)
```

Now, I check whether the proposed change is correct. The following calculates -2 * LogLik manually and get 5.6609177228379055, the same as that in R.

```val predictions = model.transform(dataset)
-2.0 * predictions.select("label", "prediction", "weight").rdd.map {case Row(y: Double, mu: Double, weight: Double) =>
val wt = math.round(weight).toInt
if (wt == 0){
0.0
} else {
dist.Binomial(wt, mu).logProbabilityOf(math.round(y * weight).toInt)
}
}.sum()
```

#### People

Wayne Zhang
Wayne Zhang
0 Vote for this issue
Watchers:
2 Start watching this issue

#### Dates

Created:
Updated:
Resolved:

#### Time Tracking

Estimated:
120h
Remaining:
120h
Logged:
Not Specified