Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.3.0, 3.2.2
Description
Limit is pushed down through a window using the ntile function, which causes results that differ from Hive 2.3.9, and Prestodb 0.268, and older versions of Spark (e.g., 3.1.3).
Assume this data:
create table t1 stored as parquet as select * from range(101);
Also assume this query:
select id, ntile(10) over (order by id) as nt from t1 limit 10;
Spark 3.2.2, Spark 3.3.0, and master produce the following:
+---+---+ |id |nt | +---+---+ |0 |1 | |1 |2 | |2 |3 | |3 |4 | |4 |5 | |5 |6 | |6 |7 | |7 |8 | |8 |9 | |9 |10 | +---+---+
However, Spark 3.1.3, Hive 2.3.9, and Prestodb 0.268 produce the following:
+---+---+ |id |nt | +---+---+ |0 |1 | |1 |1 | |2 |1 | |3 |1 | |4 |1 | |5 |1 | |6 |1 | |7 |1 | |8 |1 | |9 |1 | +---+---+
Attachments
Issue Links
- is related to
-
SPARK-38614 Don't push down limit through window that's using percent_rank
- Resolved
- links to