Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Currently in order to figure out a directory size, we have to list a directory by calling RPC getListing and get the number of its children. This is an expensive operation especially when a directory has many children because it may require multiple RPCs.
On the other hand when fetching the status of a path (i.e. calling RPC getFileInfo), the length field of FileStatus is set to be 0 if the path is a directory.
I am thinking to change this field (FileStatus#length) to be the directory size when the path is a directory. So we can call getFileInfo to get the directory size. This call is much less expensive and simpler than getListing.
Attachments
Issue Links
- relates to
-
HDFS-4995 Make getContentSummary() less expensive
- Closed