Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.0.0, 2.1.0, 2.2.0
-
None
Description
In RangePartitioner(partitions: Int, rdd: RDD[]), RangePartitioner.numPartitions is wrong if the number of elements in RDD (rdd.count()) is less than number of partitions (partitions in constructor).
Code1 to reproduce:
import spark.implicits._ val ds = spark.createDataset(Seq((1, 1))) println(ds.sort("_1").rdd.getNumPartitions) // The output of println is 2
Code2 to reproduce:
test("Number of elements in RDD is less than number of partitions") { val rdd = sc.parallelize(1 to 3).map(x => (x, x)) val partitioner = new RangePartitioner(22, rdd) assert(partitioner.numPartitions === 3) }
This test will be failed because partitioner.numPartitions is 4.
Attachments
Issue Links
- links to