Details
Description
Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0.
Minimal reproducible example
from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5)
Expected result
+---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| +---+----+ only showing top 3 rows +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---+----+ only showing top 5 rows
Actual result
+---+------------------+ | id| pr| +---+------------------+ | 0| 0.0| | 1|0.3333333333333333| | 2|0.6666666666666666| +---+------------------+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows
Attachments
Issue Links
- relates to
-
SPARK-40002 Limit improperly pushed down through window using ntile function
- Resolved
- links to