|
Nigel Daley made changes - 22/Jan/08 07:32 PM
Tsz Wo (Nicholas), SZE made changes - 22/Feb/08 11:09 PM
Instead of adding an option, how about changing the output to the followings?
bash-3.2$ ./bin/hadoop fs -du /
Found 3 items
byte file directory
159 1 0 hdfs://host:9000/a.txt
44198 1 0 hdfs://host:9000/build.xml
318 2 2 hdfs://host:9000/user
bash-3.2$ ./bin/hadoop fs -dus /
44675 4 3 /
Why not keep du like du and make a df command instead?
I am fine with making a new command. However, df in unix is for disk space usage, not directory space usage. So, it may be confusing. It seems to me that there is no specific unix command for counting files and directories.
The UNIX way of doing that is to use find with a pipe to wc (amongst other ways). But df -i is probably the closest to a single command.
Since the new feature is neither provided du nor df in unix, let create a new command. How about making a new command "count"? The usage would be
bash-3.2$ ./bin/hadoop fs -count /
44675 4 3 /
2219_20080226.patch: adding a new command "fs -count"
Tsz Wo (Nicholas), SZE made changes - 27/Feb/08 12:16 AM
1. It might be a good idea to deprecate getContentLen in ClientProtocol and Namenode.
2. INode.computeContentSummary can take an ContentSummary object as a parameter rather than an array of three longs. 3. This patch removes the optimization in DistributedFileSystem.getContentLength(). In the original code, if the path object is a DfsPath object, then no additional RPC is required. In the patch, a RPC is always required. 2219_20080227.patch:
1. Deprecated getContentLength() in ClientProtocol, NameNode, FileSystem, DistributedFileSystem and DFSClient. The ones in FSNamesystem and INode are removed directly since they are not public APIs. 2. The reason of use array of longs is efficiency. computeContentSummary(...) recursively goes through the INode tree. If a ContentSummary object is used, the values have to be updated by two method calls (get, set) for each recursive call. If we use long, we only have to do a +=. 3. Reverted DistributedFileSystem.getContentLength() to keep the optimization.
Tsz Wo (Nicholas), SZE made changes - 29/Feb/08 07:18 PM
+1. Code looks good. A minor typo : Count.DISCRIPTION should be Count.DESCRIPTION.
2219_20080229.patch: fixed the typos.
Tsz Wo (Nicholas), SZE made changes - 29/Feb/08 09:21 PM
Tsz Wo (Nicholas), SZE made changes - 29/Feb/08 09:22 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376867/2219_20080229.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 7 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 628 javac compiler warnings (more than the trunk's current 614 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1880/testReport/ This message is automatically generated. The additional javac warnings are due to the newly deprecated APIs.
I just committed this. Thanks Nicholas!
dhruba borthakur made changes - 03/Mar/08 07:10 AM
Integrated in Hadoop-trunk #418 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/418/
Tsz Wo (Nicholas), SZE made changes - 17/Apr/08 12:30 AM
Nigel Daley made changes - 21/May/08 08:05 PM
Owen O'Malley made changes - 08/Jul/09 04:42 PM
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For the implementation, I am thinking about adding a method getContentSummary(Path) in the FileSystem. It returns a ContentSummary (a new class) object which contains length, number of files and number of directories.
Similar to FileSystem.getContentLength(Path), an implementation of getContentSummary(Path), which uses FileSystem API , will be provided in FileSystem. Then, DistributedFileSystem will override getContentSummary(Path) to provide a NameNode side implementation.
Since content length can be obtained by getContentSummary(Path), I will deprecate getContentLength(Path).