Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23091

Incorrect unit test for approxQuantile

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2.1
    • Fix Version/s: 2.3.0
    • Component/s: ML, SQL, Tests
    • Labels:
      None

      Description

      Currently, test for `approxQuantile` (quantile estimation algorithm) checks whether estimated quantile is within +- 2*`relativeError` from the true quantile. See the code below:

      https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala#L157

      However, based on the original paper by Greenwald and Khanna, the estimated quantile is guaranteed to be within +- `relativeError` from the true quantile. Using the double "tolerance" is misleading and incorrect, and we should fix it.

       

        Attachments

          Activity

            People

            • Assignee:
              srowen Sean Owen
              Reporter:
              kchenphy Kuang Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: