Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.1.0
-
None
Description
Trying to get a grip on the FileIndex hierarchy, I was confused by the following inconsistency:
On the one hand, PartitioningAwareFileIndex defines leafFiles and leafDirToChildrenFiles as abstract, but on the other it fully implements listLeafFiles which does all the listing of files. However, the latter is only used by InMemoryFileIndex.
I'm hereby proposing to move this method (and all its dependencies) to the implementation class that actually uses it, and thus unclutter the PartitioningAwareFileIndex interface.
Attachments
Issue Links
- is related to
-
SPARK-24280 Speed up indexing of files in object stores by using listFiles(path, recursive=true)
- Resolved
- links to