Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
3.0.0
-
None
-
None
Description
This is a proposal to add the following function to NewHadoopRDD:
def mapPartitionsWithInputSplitAndIndex[U: ClassTag]( f: (InputSplit, Int, Iterator[(K, V)]) => Iterator[U], preservesPartitioning: Boolean = false ): RDD[U]
This new function would provide the input split with its partition index.
Having the partition index may be useful when trying to determine the mapping between part-XXXXX files and input splits.