[SPARK-22208] Improve percentile_approx by not rounding up targetError and starting from index 0 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: SQL
Labels:
- releasenotes

Description

percentile_approx never returns the first element when percentile is in (relativeError, 1/N], where relativeError default is 1/10000, and N is the total number of elements. But ideally, percentiles in [0, 1/N] should all return the first element as the answer.

For example, given input data 1 to 10, if a user queries 10% (or even less) percentile, it should return 1, because the first value 1 already reaches 10%. Currently it returns 2.

Based on the paper, targetError is not rounded up, and searching index should start from 0 instead of 1. By following the paper, we should be able to fix the cases mentioned above.

Attachments

Issue Links

is duplicated by

SPARK-22179 percentile_approx should choose the first element if it already reaches the percentage

Resolved

links to

[Github] Pull Request #19438 (wzhfy)

Activity

People

Assignee:: Zhenhua Wang

Reporter:: Zhenhua Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Oct/17 15:42

Updated:: 21/Jan/18 21:03

Resolved:: 11/Oct/17 07:19