Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41793

Incorrect result for window frames defined by a range clause on large decimals

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL

    Description

      Context https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686

      The following windowing query on a simple two-row input should produce two non-empty windows as a result

      from pprint import pprint
      data = [
        ('9223372036854775807', '11342371013783243717493546650944543.47'),
        ('9223372036854775807', '999999999999999999999999999999999999.99')
      ]
      df1 = spark.createDataFrame(data, 'a STRING, b STRING')
      df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
      df2.createOrReplaceTempView('test_table')
      df = sql('''
        SELECT 
          COUNT(1) OVER (
            PARTITION BY a 
            ORDER BY b ASC 
            RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
          ) AS CNT_1 
        FROM 
          test_table
        ''')
      res = df.collect()
      df.explain(True)
      pprint(res)
      

      SparkĀ 3.4.0-SNAPSHOT output:

      [Row(CNT_1=1), Row(CNT_1=0)]
      

      Spark 3.3.1 output as expected:

      Row(CNT_1=1), Row(CNT_1=1)]
      

      Attachments

        Issue Links

          Activity

            People

              ulysses XiDuo You
              jira.shegalov Gera Shegalov
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: