Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0
-
None
Description
Object stores which mock directory structures can have awful performance when a treewalk is executed on them. The FileSystem.listFiles(path, recursive=true) call can reduce this overhead to ~ O(1) by doing bulk paged listing of all files under a path. For filesystems without this operation, the treewalk is implemented behind the scenes: no cost compared to today's code, just a simplification
Attachments
Issue Links
- relates to
-
SPARK-17593 list files on s3 very slow
- Resolved
-
SPARK-20255 FileIndex hierarchy inconsistency
- Resolved