Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
1.6.0
-
None
-
None
Description
The stddev and variance functions currently defaults to the 'sample' version whereas Hive uses the 'population' version for this. See:
- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)
- https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L192-L196
Is this on purpose? Or by accident?
Attachments
Issue Links
- is related to
-
SPARK-11490 variance should alias var_samp instead of var_pop
- Resolved