Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.2.1
-
None
Description
Add a new api to expose total partition count in the stage belonging to the task in TaskContext, so that the task knows what fraction of the computation is doing.
With this extra information, users can also generate 32bit unique int ids as below rather than using `monotonically_increasing_id` which generates 64bit long ids.
rdd.mapPartitions { rowsIter =>
val partitionId = TaskContext.get().partitionId()
val numPartitions = TaskContext.get().numPartitions()
var i = 0
rowsIter.map { row =>
val rowId = partitionId + i * numPartitions
i += 1
(rowId, row)
}
}