-
Type:
Sub-task
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Duplicate
-
Affects Version/s: 2.8.0
-
Fix Version/s: None
-
Component/s: fs/s3
-
Labels:None
FS shell -count uses getContentSummary to summarise the contents; this slows significantly with directory tree depth. On wide directories, as the FileStatus[] array is built up before recursing down, if there are many millions of files, memory use becomes an issue
Moving to a flat listFiles listing with iterator-based scanning would allow directory depth to become a near-non-issue, avoid memory problems. We'd need to reverse-construct the directory tree for its count summary; some hash map of parent paths could build that up while iterating through the files and adding up their sizes
- duplicates
-
HADOOP-13704 S3A getContentSummary() to move to listFiles(recursive) to count children; instrument use
-
- Open
-