Current variance kernel converts all data type to `double` before calculation. It's sub-optimal for integers. Integer arithmetic is much faster than floating points, e.g., summation is 4x faster .
A quick test for calculating int32 variance shows up to 3x performance gain. Another benefit is that integer arithmetic is accurate.