Status: Patch Available
Resolution: Unresolved
I am trying to get the disk space consumed by an HDFS directory using the ContentSummary.getSpaceConsumed method. I can't get the space consumption correctly considering the replication factor. The replication factor is 2, and I was expecting twice the size of the actual file size from the above method.
I can't get the space consumption correctly considering the replication factor. The replication factor is 2, and I was expecting twice the size of the actual file size from the above method.
ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu Found 2 items -rw-r--r-- 2 ubuntu ubuntu 3145728 2020-09-08 09:55 /var/lib/ubuntu/size-test drwxrwxr-x - ubuntu ubuntu 0 2020-09-07 06:37 /var/lib/ubuntu/test
But when I run the following code,
String path = "/etc/hadoop/conf/"; conf.addResource(new Path(path + "core-site.xml")); conf.addResource(new Path(path + "hdfs-site.xml")); long size = FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed(); System.out.println("Replication : " + fileStatus.getReplication()); System.out.println("File size : " + size);
The output is
Replication : 0 File size : 3145728
Both the file size and the replication factor seems to be incorrect.
/etc/hadoop/conf/hdfs-site.xml contains the following config:
<property> <name>dfs.replication</name> <value>2</value> </property>