Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17439

QuantilesSummaries returns the wrong result after compression

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.1, 2.1.0
    • None

    Description

      clockfly found the following corner case that returns the wrong quantile (off by 1):

      test("test QuantileSummaries compression") {
          var left = new QuantileSummaries(10000, 0.0001)
          System.out.println("LEFT      RIGHT")
          System.out.println("====================")
          (0 to 10).foreach { index =>
            left = left.insert(index)
            left = left.compress()
      
            var right = new QuantileSummaries(10000, 0.0001)
            (0 to index).foreach(right.insert(_))
            right = right.compress()
            System.out.println(s"${left.query(0.5)}   ${right.query(0.5)}")
          }
        }
      

      The result is:

      LEFT      RIGHT
      ====================
      0.0   0.0
      0.0   1.0
      0.0   1.0
      0.0   1.0
      1.0   2.0
      1.0   2.0
      2.0   3.0
      2.0   3.0
      3.0   4.0
      3.0   4.0
      4.0   5.0
      

      The value of the "LEFT" column represents the output when using QuantileSummaries in Window function, the value on the "RIGHT" column represents the expected result. The different between "LEFT" and "RIGHT" column is that the "LEFT" column does intermediate compression on the storage of QuantileSummaries.

      Attachments

        Activity

          People

            thunterdb Tim Hunter
            thunterdb Tim Hunter
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: