[SPARK-44448] Wrong results for dense_rank() <= k from InferWindowGroupLimit and DenseRankLimitIterator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 3.5.0
Component/s: SQL
Labels:
- correctness

Target Version/s:

3.5.0

Description

Top-k filters on a dense_rank() window function return wrong results, due to a bug in optimization InferWindowGroupLimit, specifically in the code for DenseRankLimitIterator, introduced in https://issues.apache.org/jira/browse/SPARK-37099.

Repro:

create or replace temp view t1 (p, o) as values (1, 1), (1, 1), (1, 2), (2, 1), (2, 1), (2, 2);

select * from (select *, dense_rank() over (partition by p order by o) as rnk from t1) where rnk = 1;

Spark result:

[1,1,1]
[1,1,1]
[2,1,1]

Correct result:

[1,1,1]
[1,1,1]
[2,1,1]
[2,1,1]

The bug is in DenseRankLimitIterator, it fails to reset state properly when transitioning from one window partition to the next. reset only resets rank = 0, what it is missing is to reset currentRankRow = null. This means that when processing the second and later window partitions, the rank incorrectly gets incremented based on comparing the ordering of the last row of the previous partition to the first row of the new partition.

This means that a dense_rank window func that has more than one window partition and more than one row with dense_rank = 1 in the second or later partitions can give wrong results when optimized.

(RankLimitIterator narrowly avoids this bug by happenstance, the first row in the new partition will try to increment rank, but increment it by the value of count which is 0, so it happens to work by accident).

Unfortunately, tests for the optimization only had a single row per rank, so did not catch the bug as the bug requires multiple rows per rank.

Attachments

Issue Links

links to

[Github] Pull Request #42026 (jchen5)

[Github] Pull Request #42042 (jchen5)

Activity

People

Assignee:: Jack Chen

Reporter:: Jack Chen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 16/Jul/23 22:46

Updated:: 24/Nov/23 22:54

Resolved:: 19/Jul/23 01:13