Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
3.4.0
Description
Context https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
The following windowing query on a simple two-row input should produce two non-empty windows as a result
from pprint import pprint data = [ ('9223372036854775807', '11342371013783243717493546650944543.47'), ('9223372036854775807', '999999999999999999999999999999999999.99') ] df1 = spark.createDataFrame(data, 'a STRING, b STRING') df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)')) df2.createOrReplaceTempView('test_table') df = sql(''' SELECT COUNT(1) OVER ( PARTITION BY a ORDER BY b ASC RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING ) AS CNT_1 FROM test_table ''') res = df.collect() df.explain(True) pprint(res)
SparkĀ 3.4.0-SNAPSHOT output:
[Row(CNT_1=1), Row(CNT_1=0)]
Spark 3.3.1 output as expected:
Row(CNT_1=1), Row(CNT_1=1)]
Attachments
Issue Links
- duplicates
-
SPARK-42049 Improve AliasAwareOutputExpression
- Resolved
- links to