Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40089

Sorting of at least Decimal(20, 2) fails for some values near the max.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0, 3.3.0, 3.4.0
    • 3.1.4, 3.3.1, 3.2.3, 3.4.0
    • SQL
    • None

    Description

      I have been doing some testing with Decimal values for the RAPIDS Accelerator for Apache Spark. I have been trying to add in new corner cases and when I tried to enable the maximum supported value for a sort I started to get failures.  On closer inspection it looks like the CPU is sorting things incorrectly.  Specifically anything that is "999999999999999999.50" or above is placed as a chunk in the wrong location in the outputs.

       In local mode with 12 tasks.

      spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println) 

       

      Here you will notice that the last entry printed is [999999999999999999.49], and [999999999999999999.99] is near the top near [-999999999999999999.99]

      Attachments

        1. input.parquet
          23 kB
          Robert Joseph Evans

        Activity

          People

            revans2 Robert Joseph Evans
            revans2 Robert Joseph Evans
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: