Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.1.0
-
None
Description
Although NewHadoopRDD and HadoopRdd considers HDFS cache while calculating preferredLocations, FileScanRDD do not take into account HDFS cache while calculating preferredLocations
The enhancement can be easily implemented for large files where FilePartition only contains single HDFS file
The enhancement will also result in significant performance improvement for cached hdfs partitions