Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-985

HDFS should issue multiple RPCs for listing a large directory

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed

      Description

      Currently HDFS issues one RPC from the client to the NameNode for listing a directory. However some directories are large that contain thousands or millions of items. Listing such large directories in one RPC has a few shortcomings:
      1. The list operation holds the global fsnamesystem lock for a long time thus blocking other requests. If a large number (like thousands) of such list requests hit NameNode in a short period of time, NameNode will be significantly slowed down. Users end up noticing longer response time or lost connections to NameNode.
      2. The response message is uncontrollable big. We observed a response as big as 50M bytes when listing a directory of 300 thousand items. Even with the optimization introduced at HDFS-946 that may be able to cut the response by 20-50%, the response size will still in the magnitude of 10 mega bytes.

      I propose to implement a directory listing using multiple RPCs. Here is the plan:
      1. Each getListing RPC has an upper limit on the number of items returned. This limit could be configurable, but I am thinking to set it to be a fixed number like 500.
      2. Each RPC additionally specifies a start position for this listing request. I am thinking to use the last item of the previous listing RPC as an indicator. Since NameNode stores all items in a directory as a sorted array, NameNode uses the last item to locate the start item of this listing even if the last item is deleted in between these two consecutive calls. This has the advantage of avoid duplicate entries at the client side.
      3. The return value additionally specifies if the whole directory is done listing. If the client sees a false flag, it will continue to issue another RPC.

      This proposal will change the semantics of large directory listing in a sense that listing is no longer an atomic operation if a directory's content is changing while the listing operation is in progress.

      1. directoryBrowse_0.20yahoo_1.patch
        5 kB
        Hairong Kuang
      2. directoryBrowse_0.20yahoo_2.patch
        5 kB
        Hairong Kuang
      3. directoryBrowse_0.20yahoo.patch
        4 kB
        Hairong Kuang
      4. iterativeLS_trunk.patch
        40 kB
        Hairong Kuang
      5. iterativeLS_trunk1.patch
        42 kB
        Hairong Kuang
      6. iterativeLS_trunk2.patch
        43 kB
        Hairong Kuang
      7. iterativeLS_trunk3.patch
        48 kB
        Hairong Kuang
      8. iterativeLS_trunk3.patch
        48 kB
        Hairong Kuang
      9. iterativeLS_trunk4.patch
        48 kB
        Hairong Kuang
      10. iterativeLS_yahoo.patch
        36 kB
        Hairong Kuang
      11. iterativeLS_yahoo1.patch
        40 kB
        Hairong Kuang
      12. testFileStatus.patch
        1 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          +1. sounds like an awesome idea.

          Show
          dhruba borthakur added a comment - +1. sounds like an awesome idea.
          Hide
          Hong Tang added a comment -

          +1. In general, we should bound the work (and thus the waiting on the client side) of every RPC call.

          Show
          Hong Tang added a comment - +1. In general, we should bound the work (and thus the waiting on the client side) of every RPC call.
          Hide
          Hairong Kuang added a comment -

          A patch for review. I am sorry that this patch is generated against the 0.20 Yahoo! branch because I do not have time to work on the trunk first. But I will work on a patch against the trunk for sure before resolving this issue. This patch has a few minor changes to the proposal in the jira description.

          1. The upper limit of each getListing RPC is made configurable with a default value of 1000. This configuration property is undocumented for now. One reason to have this is for easy writing unit tests. AlsoI still need to conduct more experiments at a large scale to decide what's the right default number.
          2. A getListing RPC returns the number of remaining entries to be listed instead of a flag to indicate if there are more to be listed. The number of remaining entries could provide a heuristic to decide the initial size of the status array at the client size, thus reducing unnecessary allocation/deallocation in most cases.

          Unit tests are added to TestFileStatus to check if multiple RPCs work fine or not.

          Show
          Hairong Kuang added a comment - A patch for review. I am sorry that this patch is generated against the 0.20 Yahoo! branch because I do not have time to work on the trunk first. But I will work on a patch against the trunk for sure before resolving this issue. This patch has a few minor changes to the proposal in the jira description. 1. The upper limit of each getListing RPC is made configurable with a default value of 1000. This configuration property is undocumented for now. One reason to have this is for easy writing unit tests. AlsoI still need to conduct more experiments at a large scale to decide what's the right default number. 2. A getListing RPC returns the number of remaining entries to be listed instead of a flag to indicate if there are more to be listed. The number of remaining entries could provide a heuristic to decide the initial size of the status array at the client size, thus reducing unnecessary allocation/deallocation in most cases. Unit tests are added to TestFileStatus to check if multiple RPCs work fine or not.
          Hide
          Suresh Srinivas added a comment -

          Initial review from going through half the patch:

          1. DFSClient.java - instead of lastReturnedName we could use a generic name startFrom and update the param doc appropriately.
          2. Is the name PartialFileStatus better than PathPartialListing?
          3. DFSClient.listStatus() - result should be null in case the directory is deleted midway, isntead of returning what is accumulated until then. Number of lines in the code can be reduced by folding all the code into do-while.
          4. DFSClient.listStatus() - document calling with name=EMPTY_NAME the first time.
          5. FsDirectory.getListing - avoid startChild+1 in the loop.
          6. INodeDirectory.nextChild() - instead of checking for name.length == 0, we should compare it with EMPTY_NAME.
          7. Why is older variant of getListing in FsNameSystem, NameNode (did not check if there are others) not removed? It seems to be removed in ClientProtocol.java

          I will post the comments for the rest of the code soon.

          Show
          Suresh Srinivas added a comment - Initial review from going through half the patch: DFSClient.java - instead of lastReturnedName we could use a generic name startFrom and update the param doc appropriately. Is the name PartialFileStatus better than PathPartialListing? DFSClient.listStatus() - result should be null in case the directory is deleted midway, isntead of returning what is accumulated until then. Number of lines in the code can be reduced by folding all the code into do-while. DFSClient.listStatus() - document calling with name=EMPTY_NAME the first time. FsDirectory.getListing - avoid startChild+1 in the loop. INodeDirectory.nextChild() - instead of checking for name.length == 0, we should compare it with EMPTY_NAME. Why is older variant of getListing in FsNameSystem, NameNode (did not check if there are others) not removed? It seems to be removed in ClientProtocol.java I will post the comments for the rest of the code soon.
          Hide
          Hairong Kuang added a comment -

          > DFSClient.listStatus() - result should be null in case the directory is deleted midway, isntead of returning what is accumulated until then.
          This is a controversial semantics. I think either way is fine. But my implementation is consistent with what we do with the WebUI.

          > Number of lines in the code can be reduced by folding all the code into do-while.
          The reason that I did not fold into one do-while is to optimize the case when one RPC is enough for listing a directory, which I think is the common case.

          Show
          Hairong Kuang added a comment - > DFSClient.listStatus() - result should be null in case the directory is deleted midway, isntead of returning what is accumulated until then. This is a controversial semantics. I think either way is fine. But my implementation is consistent with what we do with the WebUI. > Number of lines in the code can be reduced by folding all the code into do-while. The reason that I did not fold into one do-while is to optimize the case when one RPC is enough for listing a directory, which I think is the common case.
          Hide
          Hairong Kuang added a comment -

          This patch addressed Suresh's comments except for comments 3, 5, and 6.
          1. rename lastReturnedName to be startAfter;
          2. rename PathPartialListing to be DirectoryListing;

          in additon, I defined the config property dfs.ls.limit and its default value to constants and add comments to DistributedFileSystem#listStatus that explains the operation is no longer atomic.

          Show
          Hairong Kuang added a comment - This patch addressed Suresh's comments except for comments 3, 5, and 6. 1. rename lastReturnedName to be startAfter; 2. rename PathPartialListing to be DirectoryListing; in additon, I defined the config property dfs.ls.limit and its default value to constants and add comments to DistributedFileSystem#listStatus that explains the operation is no longer atomic.
          Hide
          Suresh Srinivas added a comment -

          Patch looks good. Few comments:

          1. DFSClient.listPaths() - comment could be more precies. "Use HdfsFileStatus.EMPTY_NAME as startAfter to get a list starting from the first file in the directory"
          2. FSDirectory - lslimit should be set to default value if the configuredLimit <= 0
          Show
          Suresh Srinivas added a comment - Patch looks good. Few comments: DFSClient.listPaths() - comment could be more precies. "Use HdfsFileStatus.EMPTY_NAME as startAfter to get a list starting from the first file in the directory" FSDirectory - lslimit should be set to default value if the configuredLimit <= 0
          Hide
          Suresh Srinivas added a comment -

          I do see that you are already doing the 2. in my previous comment. +1 for the patch. This is a very good change for the stability of the system. Thanks Hairong.

          Show
          Suresh Srinivas added a comment - I do see that you are already doing the 2. in my previous comment. +1 for the patch. This is a very good change for the stability of the system. Thanks Hairong.
          Hide
          Hairong Kuang added a comment -

          This patch fixed a bug in TestFileStatus.java.

          Show
          Hairong Kuang added a comment - This patch fixed a bug in TestFileStatus.java.
          Hide
          gary murry added a comment -

          One Minor issue, in browseDirectory.jsp it looks like you are deleting a line without deleting an associated comment?

           
          diff --git src/webapps/datanode/browseDirectory.jsp src/webapps/datanode/browseDirectory.jsp
          index ee1defd..4585b0e 100644
          --- src/webapps/datanode/browseDirectory.jsp
          +++ src/webapps/datanode/browseDirectory.jsp
          @@ -76,7 +76,6 @@
                   return;
                 }
                 // directory
          -      HdfsFileStatus[] files = dfs.listPaths(target);
                 //generate a table and dump the info
                 String [] headings = { "Name", "Type", "Size", "Replication", 
                                         "Block Size", "Modification Time",
          
          Show
          gary murry added a comment - One Minor issue, in browseDirectory.jsp it looks like you are deleting a line without deleting an associated comment? diff --git src/webapps/datanode/browseDirectory.jsp src/webapps/datanode/browseDirectory.jsp index ee1defd..4585b0e 100644 --- src/webapps/datanode/browseDirectory.jsp +++ src/webapps/datanode/browseDirectory.jsp @@ -76,7 +76,6 @@ return; } // directory - HdfsFileStatus[] files = dfs.listPaths(target); //generate a table and dump the info String [] headings = { "Name", "Type", "Size", "Replication", "Block Size", "Modification Time",
          Hide
          Hairong Kuang added a comment -

          I assume that you were talking about the comment line "// directory". I think this comment means that "now we are handling the case target is a directory." It has nothing to do with the statement "dfs.listPaths(target)" that I removed.

          Show
          Hairong Kuang added a comment - I assume that you were talking about the comment line "// directory". I think this comment means that "now we are handling the case target is a directory." It has nothing to do with the statement "dfs.listPaths(target)" that I removed.
          Hide
          Hairong Kuang added a comment -

          Here is the patch for the trunk.

          Show
          Hairong Kuang added a comment - Here is the patch for the trunk.
          Hide
          Hairong Kuang added a comment -

          This patch is for trunk and fixed a web UI bug when display a directory structure.

          Show
          Hairong Kuang added a comment - This patch is for trunk and fixed a web UI bug when display a directory structure.
          Hide
          Hairong Kuang added a comment -

          This patch fixes the UI bug when browse directory from web in yahoo 0.20 branch.

          Show
          Hairong Kuang added a comment - This patch fixes the UI bug when browse directory from web in yahoo 0.20 branch.
          Hide
          Hairong Kuang added a comment -

          iterativeLS_trunk2.patch fixed a bug in TestFileStatus.java.

          Show
          Hairong Kuang added a comment - iterativeLS_trunk2.patch fixed a bug in TestFileStatus.java.
          Hide
          Suresh Srinivas added a comment -

          Comments for trunk version of the patch:

          1. I feel throwing an exception instead of returning the accumulated list is a better behavior. This will avoid applications using the partial list to query further and handle file not found exception. If we continue to return partial list, add comments to relevant methods in FileSystem that if a directory is deleted, the accumulated list will be returned.
          2. Add test cases to test deletion of directory while list status is still iterating.
          3. There are some mapred changes in 20 version of the file that needs to be made in mapred branch?
          Show
          Suresh Srinivas added a comment - Comments for trunk version of the patch: I feel throwing an exception instead of returning the accumulated list is a better behavior. This will avoid applications using the partial list to query further and handle file not found exception. If we continue to return partial list, add comments to relevant methods in FileSystem that if a directory is deleted, the accumulated list will be returned. Add test cases to test deletion of directory while list status is still iterating. There are some mapred changes in 20 version of the file that needs to be made in mapred branch?
          Hide
          Suresh Srinivas added a comment -

          20 version of the patch looks good except a minor comment: there are some tabs that cause improper indentation.

          Show
          Suresh Srinivas added a comment - 20 version of the patch looks good except a minor comment: there are some tabs that cause improper indentation.
          Hide
          Hairong Kuang added a comment -

          This patch fixes the indention problem and returns null if the target directory is deleted before the full listing has been fetched.

          Show
          Hairong Kuang added a comment - This patch fixes the indention problem and returns null if the target directory is deleted before the full listing has been fetched.
          Hide
          Hairong Kuang added a comment -

          This patch incorporated Suresh's review comments. It throws FileNotFoundException when the listing directory is deleted and it adds an aspectJ test to test this case. Comment 3 needs to be fixed in Mapreduce which I will do it later.

          Show
          Hairong Kuang added a comment - This patch incorporated Suresh's review comments. It throws FileNotFoundException when the listing directory is deleted and it adds an aspectJ test to test this case. Comment 3 needs to be fixed in Mapreduce which I will do it later.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12438987/iterativeLS_trunk3.patch
          against trunk revision 923467.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 19 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/129/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/129/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/129/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/129/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438987/iterativeLS_trunk3.patch against trunk revision 923467. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/129/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/129/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/129/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/129/console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          +1 for the 20 version of the patch.

          For trunk patch, do you need to change the log4j log level of DFSClient and DataNode to DEBUG or was this done for testing alone?

          Show
          Suresh Srinivas added a comment - +1 for the 20 version of the patch. For trunk patch, do you need to change the log4j log level of DFSClient and DataNode to DEBUG or was this done for testing alone?
          Hide
          Hairong Kuang added a comment -

          Log4j change was not intended. This patch removes it.

          Show
          Hairong Kuang added a comment - Log4j change was not intended. This patch removes it.
          Hide
          Hairong Kuang added a comment -

          This patch synced with yahoo 0.20 security branch.

          Show
          Hairong Kuang added a comment - This patch synced with yahoo 0.20 security branch.
          Hide
          Hairong Kuang added a comment -

          This fixes a bug in TestFileStatus unit test.

          Show
          Hairong Kuang added a comment - This fixes a bug in TestFileStatus unit test.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12439098/iterativeLS_trunk4.patch
          against trunk revision 923467.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 19 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/272/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/272/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/272/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/272/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439098/iterativeLS_trunk4.patch against trunk revision 923467. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/272/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/272/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/272/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/272/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          Failed contrib tests seem irrelevant to my patch:
          /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build.xml:569: The following error occurred while executing this line:
          [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/build.xml:48: The following error occurred while executing this line:
          [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/hdfsproxy/build.xml:292: org.codehaus.cargo.container.ContainerException: Failed to download http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip

          Show
          Hairong Kuang added a comment - Failed contrib tests seem irrelevant to my patch: /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/build.xml:569: The following error occurred while executing this line: [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/build.xml:48: The following error occurred while executing this line: [exec] /grid/0/hudson/hudson-slave/workspace/Hdfs-Patch-h5.grid.sp2.yahoo.net/trunk/src/contrib/hdfsproxy/build.xml:292: org.codehaus.cargo.container.ContainerException: Failed to download http://apache.osuosl.org/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip
          Hide
          Suresh Srinivas added a comment -

          +1 for the trunk version of the patch as well.

          Show
          Suresh Srinivas added a comment - +1 for the trunk version of the patch as well.
          Hide
          Hairong Kuang added a comment -

          I've just committed this.

          Show
          Hairong Kuang added a comment - I've just committed this.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #218 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/218/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #218 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/218/ )
          Hide
          Hairong Kuang added a comment -

          > There are some mapred changes in 20 version of the file that needs to be made in mapred branch?
          It turns out there is no need to change mapred in the trunk.

          Show
          Hairong Kuang added a comment - > There are some mapred changes in 20 version of the file that needs to be made in mapred branch? It turns out there is no need to change mapred in the trunk.
          Hide
          Chris Douglas added a comment -

          It turns out there is no need to change mapred in the trunk.

          This is already done. (MAPREDUCE-1615)

          Show
          Chris Douglas added a comment - It turns out there is no need to change mapred in the trunk. This is already done. ( MAPREDUCE-1615 )
          Hide
          Hairong Kuang added a comment -

          Thanks Chris!

          Show
          Hairong Kuang added a comment - Thanks Chris!
          Hide
          Hairong Kuang added a comment -

          I performed some experiments to test the overhead of iterative listing. The experiments were performed on a NameNode with no traffic with security disabled. The client listed the directory for 200 times sequentially and the table below shows the average time for listing all entries of a directory. When the max # of returned entries per call is 1,000, this means that each directory listing requires multiple RPC calls to NameNode. In the case that max # of returned entries is 10,000, each directory listing requires only one RPC call.

          Max # of returned entries per getListing RPC Directory of 2,000 entries Directory of 4,000 entries Directory of 10,000 entries
          1,000 71.86ms 145.88ms 343.04ms
          10,000 70.22ms 165.66ms 332.1ms
          Show
          Hairong Kuang added a comment - I performed some experiments to test the overhead of iterative listing. The experiments were performed on a NameNode with no traffic with security disabled. The client listed the directory for 200 times sequentially and the table below shows the average time for listing all entries of a directory. When the max # of returned entries per call is 1,000, this means that each directory listing requires multiple RPC calls to NameNode. In the case that max # of returned entries is 10,000, each directory listing requires only one RPC call. Max # of returned entries per getListing RPC Directory of 2,000 entries Directory of 4,000 entries Directory of 10,000 entries 1,000 71.86ms 145.88ms 343.04ms 10,000 70.22ms 165.66ms 332.1ms
          Hide
          dhruba borthakur added a comment -

          wow, these numbers are cool. Does this mea that directory listing (especially for large directories) are bottlenecked by the memory-allocation-and-processing at the NN and not by the number of round-trip calls made to the NN?

          Show
          dhruba borthakur added a comment - wow, these numbers are cool. Does this mea that directory listing (especially for large directories) are bottlenecked by the memory-allocation-and-processing at the NN and not by the number of round-trip calls made to the NN?
          Hide
          Hairong Kuang added a comment -

          Hi Dhruba! Yes the data are good! I was very concerned that the feature would cause a lot of performance degradation. Directory listing is indeed very CPU intensive at NN.

          Show
          Hairong Kuang added a comment - Hi Dhruba! Yes the data are good! I was very concerned that the feature would cause a lot of performance degradation. Directory listing is indeed very CPU intensive at NN.
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #302 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/302/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #302 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/302/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/ )

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development