Details

Type: Wish

Status: Resolved

Priority: Minor

Resolution: Duplicate

Affects Version/s: None

Fix Version/s: None

Component/s: Query Processor

Labels:None
Description
The last step in the unholy triumvirate of statistical builtins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial.
(Average, variance, and n can be cocomputed in one pass, so if you're doing var() you can basically have avg() and count() for free.)
Issue Links
Activity
Adam Kramer
created issue 
Adam Kramer
made changes 
Field  Original Value  New Value 

Description  The last step in the unholy triumvirate of statistical builtins is the variance...we already have the n (count) and the mean (avg). I currently have one reduce step that just computes mean/n/variance and writes it to a table, so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial. (Average, variance, and n can be cocomputed in one pass) 
The last step in the unholy triumvirate of statistical builtins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial. (Average, variance, and n can be cocomputed in one pass, so if you're doing var() you can basically have avg() and count() for free.) 
David Phillips
made changes 
Assignee  David Phillips [ electrum ] 
Jeff Hammerbacher
made changes 
Component/s  Query Processor [ 12312586 ] 
David Phillips
made changes 
Summary  var(col) builtin to go with avg(col) and count(col)  Add standard statistical functions 
David Phillips
made changes 
Adam Kramer
made changes 
Status  Open [ 1 ]  Resolved [ 5 ] 
Resolution  Fixed [ 1 ] 
Carl Steinbach
made changes 
Carl Steinbach
made changes 
Resolution  Fixed [ 1 ]  
Status  Resolved [ 5 ]  Reopened [ 4 ] 
Carl Steinbach
made changes 
Resolution  Duplicate [ 3 ]  
Status  Reopened [ 4 ]  Resolved [ 5 ] 
Transition  Time In Source Status  Execution Times  Last Executer  Last Execution Date  


604d 22h 6m  1  Adam Kramer  08/Aug/10 08:50  

1d 8h 32m  1  Carl Steinbach  09/Aug/10 17:23  

12s  1  Carl Steinbach  09/Aug/10 17:23 
Rather than add a nonstandard VAR function, we should add these ANSI SQL 2003 functions:
COVAR_POP
COVAR_SAMP
STDDEV_POP
STDDEV_SAMP
VAR_POP
VAR_SAMP
I recommend against adding the nonstandard STDDEV and VARIANCE functions as their behavior differs between MySQL (population) and Oracle/PostgreSQL (sample).
For the time being, you can use this workaround: