Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.3.0
Description
percentile_approx never returns the first element when percentile is in (relativeError, 1/N], where relativeError default is 1/10000, and N is the total number of elements. But ideally, percentiles in [0, 1/N] should all return the first element as the answer.
For example, given input data 1 to 10, if a user queries 10% (or even less) percentile, it should return 1, because the first value 1 already reaches 10%. Currently it returns 2.
Based on the paper, targetError is not rounded up, and searching index should start from 0 instead of 1. By following the paper, we should be able to fix the cases mentioned above.
Attachments
Issue Links
- is duplicated by
-
SPARK-22179 percentile_approx should choose the first element if it already reaches the percentage
- Resolved
- links to