Details

Type: Wish

Status: Resolved

Priority: Minor

Resolution: Duplicate

Affects Version/s: None

Fix Version/s: None

Component/s: Query Processor

Labels:None
Description
The last step in the unholy triumvirate of statistical builtins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial.
(Average, variance, and n can be cocomputed in one pass, so if you're doing var() you can basically have avg() and count() for free.)
Issue Links
Activity
Carl Steinbach
made changes 
Resolution  Duplicate [ 3 ]  
Status  Reopened [ 4 ]  Resolved [ 5 ] 
Carl Steinbach
made changes 
Resolution  Fixed [ 1 ]  
Status  Resolved [ 5 ]  Reopened [ 4 ] 
Carl Steinbach
made changes 
Adam Kramer
made changes 
Status  Open [ 1 ]  Resolved [ 5 ] 
Resolution  Fixed [ 1 ] 
David Phillips
made changes 
David Phillips
made changes 
Summary  var(col) builtin to go with avg(col) and count(col)  Add standard statistical functions 
Jeff Hammerbacher
made changes 
Component/s  Query Processor [ 12312586 ] 
David Phillips
made changes 
Assignee  David Phillips [ electrum ] 
Adam Kramer
made changes 
Field  Original Value  New Value 

Description  The last step in the unholy triumvirate of statistical builtins is the variance...we already have the n (count) and the mean (avg). I currently have one reduce step that just computes mean/n/variance and writes it to a table, so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial. (Average, variance, and n can be cocomputed in one pass) 
The last step in the unholy triumvirate of statistical builtins is the variance. We already have the n (count) and the mean (avg). I currently have a job or two that filters all of the data into a single reducer which just computes mean/n/variance and writes it to a table...so my guess is that this would be a pretty big speed increase. Not a huge deal though, as computing the variance myself is trivial. (Average, variance, and n can be cocomputed in one pass, so if you're doing var() you can basically have avg() and count() for free.) 
Adam Kramer
created issue 
VAR and SD were added months ago.