Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.3.4, 2.4.7, 3.0.1, 3.1.0
Description
Read input data from the attached CSV file:
val df = spark.read.option("header", "true") .option("inferSchema", "true") .csv("/Users/maximgekk/tmp/percentile_approx-input.csv") .repartition(1) df.createOrReplaceTempView(table)
Calculate the 0.77 percentile with accuracy 1e-05:
spark.sql( s"""SELECT | percentile_approx(tr_rat_resampling_score, 0.77, 100000) |FROM $table """.stripMargin).show
+------------------------------------------------------------------------+ |percentile_approx(tr_rat_resampling_score, CAST(0.77 AS DOUBLE), 100000)| +------------------------------------------------------------------------+ | 1000| +------------------------------------------------------------------------+
The same for smaller accuracy 0.001:
+----------------------------------------------------------------------+ |percentile_approx(tr_rat_resampling_score, CAST(0.77 AS DOUBLE), 1000)| +----------------------------------------------------------------------+ | 18| +----------------------------------------------------------------------+
and better accuracy 1e-06:
+-------------------------------------------------------------------------+ |percentile_approx(tr_rat_resampling_score, CAST(0.77 AS DOUBLE), 1000000)| +-------------------------------------------------------------------------+ | 17| +-------------------------------------------------------------------------+
For the accuracy 1e-05, the result must be around 17-18 but not 1000.
Here is percentile calculation in Google Sheets for the same input:
https://docs.google.com/spreadsheets/d/1Y1i4Td6s9jZQ-bD4IRTESLXP3UxKpqJSXGtmx0Q5TA0/edit?usp=sharing
Attachments
Attachments
Issue Links
- is duplicated by
-
SPARK-30882 Inaccurate results with higher precision ApproximatePercentile results
- Resolved
-
SPARK-31430 Bug in the approximate quantile computation.
- Resolved
- is related to
-
SPARK-30882 Inaccurate results with higher precision ApproximatePercentile results
- Resolved
- links to