FsShell.du has two inefficiencies:
- calling getContentSummary twice for each top-level item rather than calling it once and saving the result
- calling getContentSummary for files rather than using the size it already has in FileStatus
getContentSummary has one:
- calling itself for files rather than using the length it already has in FileStatus
Every call to getContentSummary results in a call to getFileStatus, which may be expensive (e.g. NativeS3FileSystem has both network latency and actual monetary cost).
The simple solution:
- FsShell.du calls once per item and saves the ContentSummary
- FsShell.du uses FileStatus.getLen for files
- getContentSummary only calls itself for directories
Another solution, rather than adding special casing to callers, is to add a getContentSummary that takes a FileStatus.