Details
Description
There are two ways of DU/DF getting used space that are insufficient.
- Running DU across lots of disks is very expensive and running all of the processes at the same time creates a noticeable IO spike.
- Running DF is inaccurate when the disk sharing by multiple datanode or other servers.
Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory is very small and accurate.
Attachments
Attachments
Issue Links
- causes
-
HDFS-14986 ReplicaCachingGetSpaceUsed throws ConcurrentModificationException
- Resolved
- is related to
-
HDFS-15174 Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary io operations
- Resolved
-
HDFS-15039 Cache meta file length of FinalizedReplica to reduce call File.length()
- Resolved
- relates to
-
HADOOP-12973 make DU pluggable
- Resolved
-
HADOOP-12974 Create a CachingGetSpaceUsed implementation that uses df
- Resolved
-
HADOOP-9884 Hadoop calling du -sk is expensive
- Resolved
- links to