Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41793

Incorrect result for window frames defined by a range clause on large decimals

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL

    Description

      Context https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686

      The following windowing query on a simple two-row input should produce two non-empty windows as a result

      from pprint import pprint
      data = [
        ('9223372036854775807', '11342371013783243717493546650944543.47'),
        ('9223372036854775807', '999999999999999999999999999999999999.99')
      ]
      df1 = spark.createDataFrame(data, 'a STRING, b STRING')
      df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
      df2.createOrReplaceTempView('test_table')
      df = sql('''
        SELECT 
          COUNT(1) OVER (
            PARTITION BY a 
            ORDER BY b ASC 
            RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
          ) AS CNT_1 
        FROM 
          test_table
        ''')
      res = df.collect()
      df.explain(True)
      pprint(res)
      

      SparkĀ 3.4.0-SNAPSHOT output:

      [Row(CNT_1=1), Row(CNT_1=0)]
      

      Spark 3.3.1 output as expected:

      Row(CNT_1=1), Row(CNT_1=1)]
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ulysses XiDuo You
            jira.shegalov Gera Shegalov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment