Details

Type: Wish

Status: Resolved

Priority: Minor

Resolution: Duplicate

Affects Version/s: None

Fix Version/s: None

Component/s: Query Processor

Labels:None
Description
The last step in the unholy triumvirate of statistical builtins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial.
(Average, variance, and n can be cocomputed in one pass, so if you're doing var() you can basically have avg() and count() for free.)
Issue Links
Activity
 All
 Comments
 Work Log
 History
 Activity
 Transitions
Rather than add a nonstandard VAR function, we should add these ANSI SQL 2003 functions:
COVAR_POP
COVAR_SAMP
STDDEV_POP
STDDEV_SAMP
VAR_POP
VAR_SAMP
I recommend against adding the nonstandard STDDEV and VARIANCE functions as their behavior differs between MySQL (population) and Oracle/PostgreSQL (sample).
For the time being, you can use this workaround: