Details
-
Bug
-
Status: To Do
-
Critical
-
Resolution: Unresolved
-
None
-
None
Description
The current quantization strategy for `calib_mode='entropy'` is to calculate the KL divergence for different thresholds and choose the best threshold. This assumes that the random variable is nonzero for all reals and is a continuous random variable. Because we are discretizing the distribution, we smooth the distribution over the range `[-threshold, threshold]`. What we are not considering is that the entire sampled distribution may be not in the range `[-threshold, threshold]` and thus we end up with all zeros in the sampled candidate `p` distribution inside of `_get_optimal_threshold`.
I have added a check that the distribution(possibly unnormalized) is proper before attempting to smooth or else we'll run into a divide by 0 error.
In most cases, activation functions and layers for classification type problems output numbers symmetric around 0. This is not the case for a regressor's last layer, and there are various other examples where the activation distribution is not around 0, and this was a major blockage for airbnb's adoption into mxnet's quantization capabilities.
Attachments
Issue Links
- links to