Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44871

Fix PERCENTILE_DISC behaviour

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1
    • 3.4.2, 3.5.0, 4.0.0, 3.3.4
    • SQL

    Description

      Currently percentile_disc() returns incorrect results in some cases:

      E.g.:

      SELECT
        percentile_disc(0.0) WITHIN GROUP (ORDER BY a) as p0,
        percentile_disc(0.1) WITHIN GROUP (ORDER BY a) as p1,
        percentile_disc(0.2) WITHIN GROUP (ORDER BY a) as p2,
        percentile_disc(0.3) WITHIN GROUP (ORDER BY a) as p3,
        percentile_disc(0.4) WITHIN GROUP (ORDER BY a) as p4,
        percentile_disc(0.5) WITHIN GROUP (ORDER BY a) as p5,
        percentile_disc(0.6) WITHIN GROUP (ORDER BY a) as p6,
        percentile_disc(0.7) WITHIN GROUP (ORDER BY a) as p7,
        percentile_disc(0.8) WITHIN GROUP (ORDER BY a) as p8,
        percentile_disc(0.9) WITHIN GROUP (ORDER BY a) as p9,
        percentile_disc(1.0) WITHIN GROUP (ORDER BY a) as p10
      FROM VALUES (0), (1), (2), (3), (4) AS v(a)
      

      returns:

      +---+---+---+---+---+---+---+---+---+---+---+
      | p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10|
      +---+---+---+---+---+---+---+---+---+---+---+
      |0.0|0.0|0.0|1.0|1.0|2.0|2.0|2.0|3.0|3.0|4.0|
      +---+---+---+---+---+---+---+---+---+---+---+
      

      but it should return:

      +---+---+---+---+---+---+---+---+---+---+---+
      | p0| p1| p2| p3| p4| p5| p6| p7| p8| p9|p10|
      +---+---+---+---+---+---+---+---+---+---+---+
      |0.0|0.0|0.0|1.0|1.0|2.0|2.0|3.0|3.0|4.0|4.0|
      +---+---+---+---+---+---+---+---+---+---+---+
      

      Attachments

        Activity

          People

            petertoth Peter Toth
            petertoth Peter Toth
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: