Details

Type: Wish

Status: Resolved

Priority: Minor

Resolution: Duplicate

Affects Version/s: None

Fix Version/s: None

Component/s: Query Processor

Labels:None
Description
The last step in the unholy triumvirate of statistical builtins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial.
(Average, variance, and n can be cocomputed in one pass, so if you're doing var() you can basically have avg() and count() for free.)
Rather than add a nonstandard VAR function, we should add these ANSI SQL 2003 functions:
COVAR_POP
COVAR_SAMP
STDDEV_POP
STDDEV_SAMP
VAR_POP
VAR_SAMP
I recommend against adding the nonstandard STDDEV and VARIANCE functions as their behavior differs between MySQL (population) and Oracle/PostgreSQL (sample).
For the time being, you can use this workaround: