Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Add PERCENTILE function.
Return a percentile value based on a continuous distribution of the input column. If no input row lies exactly at the desired percentile, the result is calculated using linear interpolation of the two nearest input values. NULL values are ignored in the calculation.
Example:
> SELECT PERCENTILE(col, 0.3) FROM VALUES (0), (10), (10) AS tab(col); 6.0 > SELECT PERCENTILE(col, 0.3, freq) FROM VALUES (0, 1), (10, 2) AS tab(col, freq); 6.0 > SELECT PERCENTILE(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col); [2.5,7.5]
Syntax:
PERCENTILE(expr, percentage[, frequency])
Arguments:
- expr: A NUMERIC expression.
- percentage: A NUMERIC expression between 0 and 1 or an ARRAY of NUMERIC expressions, each between 0 and 1.
- frequency: An optional integral number greater than 0.
Returns:
DOUBLE if percentage is numeric, or an ARRAY of DOUBLE if percentage is an ARRAY.
Frequency describes the number of times expr must be counted. The default value is 1.
See also:
- Hive
- Spark
- Databricks
- PostgreSQL percentile_cont
- Snowflake
- Oracle
- Wiki
Currently our implementation is inspired by PERCENTILE of Spark, which offers an additional parameter frequency compared to SQL standard function PERCENTILE_CONT.
Based on this function, we can easily extend support fo PERCENTILE_CONT and PERCENTILE_DISC in SQL standard with a little modification in the future.
Attachments
Attachments
Issue Links
- links to