Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-786

Implement getContentSummary(..) in HftpFileSystem

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HftpFileSystem does not override getContentSummary(..). As a result, it uses FileSystem's default implementation, which computes content summary on the client side by calling listStatus(..) recursively. In contrast, DistributedFileSystem has overridden getContentSummary(..) and does the computation on the NameNode.

      As a result, running "fs -dus" on hftp is much slower than running it on hdfs.

      1. h786_20091223.patch
        11 kB
        Tsz Wo Nicholas Sze
      2. h786_20091224.patch
        12 kB
        Tsz Wo Nicholas Sze
      3. h786_20100104.patch
        12 kB
        Tsz Wo Nicholas Sze
      4. h786_20100106_0.20.patch
        12 kB
        Tsz Wo Nicholas Sze
      5. h786_20100106.patch
        12 kB
        Tsz Wo Nicholas Sze
      6. h786_20100223_0.20.patch
        12 kB
        Suresh Srinivas

        Activity

        Hide
        Yoram Arnon added a comment -

        Adding some numbers:

        running the command locally, on a particular webmap cluster's namenode, takes 3 seconds:
        time hadoop/bin/hadoop fs -dus /.../atoms

        real 0m2.916s
        user 0m1.215s
        sys 0m0.171s

        running the same command, still locally, using hftp, it takes 18 minutes:
        time hadoop/bin/hadoop fs -dus hftp://.../atoms

        real 18m11.154s
        user 10m37.726s
        sys 0m16.516s

        running the command remotely, from a client in a different datacenter, again using hftp, took 3 hours and change (sorry, no 'time' info)

        Show
        Yoram Arnon added a comment - Adding some numbers: running the command locally, on a particular webmap cluster's namenode, takes 3 seconds: time hadoop/bin/hadoop fs -dus /.../atoms real 0m2.916s user 0m1.215s sys 0m0.171s running the same command, still locally, using hftp, it takes 18 minutes: time hadoop/bin/hadoop fs -dus hftp://.../atoms real 18m11.154s user 10m37.726s sys 0m16.516s running the command remotely, from a client in a different datacenter, again using hftp, took 3 hours and change (sorry, no 'time' info)
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h786_20091223.patch: first patch implementing HftpFileSystem.getContentSummary(..).

        Show
        Tsz Wo Nicholas Sze added a comment - h786_20091223.patch: first patch implementing HftpFileSystem.getContentSummary(..).
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h786_20091224.patch: changed log level and added a trace message in the tests.

        It passed all the tests in my machine.

        Show
        Tsz Wo Nicholas Sze added a comment - h786_20091224.patch: changed log level and added a trace message in the tests. It passed all the tests in my machine.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428929/h786_20091224.patch
        against trunk revision 893650.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/160/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/160/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/160/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/160/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428929/h786_20091224.patch against trunk revision 893650. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/160/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/160/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/160/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/160/console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h786_20100104.patch: updated with trunk.

        Show
        Tsz Wo Nicholas Sze added a comment - h786_20100104.patch: updated with trunk.
        Hide
        Chris Douglas added a comment -

        Only minor nits:

        • WARN may be overly aggressive for a client calling getContentSummary in a loop. Both it and the TRACE logger are probably unnecessary.
        • TestListPathServlet can be updated to use MiniDFSCluster::getHftpFileSystem in its setup method

        Other than that, +1

        Show
        Chris Douglas added a comment - Only minor nits: WARN may be overly aggressive for a client calling getContentSummary in a loop. Both it and the TRACE logger are probably unnecessary. TestListPathServlet can be updated to use MiniDFSCluster::getHftpFileSystem in its setup method Other than that, +1
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Thanks Chris for the review comments.

        h786_20100106.patch: removed both log messages and updated TestListPathServlet.

        Show
        Tsz Wo Nicholas Sze added a comment - Thanks Chris for the review comments. h786_20100106.patch: removed both log messages and updated TestListPathServlet.
        Hide
        Chris Douglas added a comment -

        This change:

             final String str = "hftp://"
                 + CONF.get(DFSConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY);
             hftpURI = new URI(str);
        -    hftpFs = (HftpFileSystem) FileSystem.newInstance(hftpURI, CONF);
        +    hftpFs = cluster.getHftpFileSystem();
        

        makes dead code of the preceding lines

        Show
        Chris Douglas added a comment - This change: final String str = "hftp://" + CONF.get(DFSConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY); hftpURI = new URI(str); - hftpFs = (HftpFileSystem) FileSystem.newInstance(hftpURI, CONF); + hftpFs = cluster.getHftpFileSystem(); makes dead code of the preceding lines
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12429613/h786_20100106.patch
        against trunk revision 896735.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/172/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/172/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/172/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/172/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429613/h786_20100106.patch against trunk revision 896735. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/172/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/172/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/172/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/172/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Nicholas points out that hftpURI is used elsewhere, so the code isn't actually dead and restructuring TestListPathServlet is of questionable value.

        I committed this. Thanks, Nicholas!

        Show
        Chris Douglas added a comment - Nicholas points out that hftpURI is used elsewhere, so the code isn't actually dead and restructuring TestListPathServlet is of questionable value. I committed this. Thanks, Nicholas!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #195 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/195/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #195 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/195/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #163 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/163/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #163 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/163/ )
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #94 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/94/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #94 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/94/ )
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #183 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/183/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #183 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/183/ )
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h786_20100106_0.20.patch: for 0.20 (won't be committed).

        Show
        Tsz Wo Nicholas Sze added a comment - h786_20100106_0.20.patch: for 0.20 (won't be committed).
        Hide
        Suresh Srinivas added a comment -

        Minor update to the 20 version of the file - MiniDFSCluster was using config param "dfs.namenode.http-address" instead of "dfs.http.address".

        Show
        Suresh Srinivas added a comment - Minor update to the 20 version of the file - MiniDFSCluster was using config param "dfs.namenode.http-address" instead of "dfs.http.address".
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Oops, I used the new constant value in trunk on the 0.20 patch. Thanks Suresh for catching this.

        Show
        Tsz Wo Nicholas Sze added a comment - Oops, I used the new constant value in trunk on the 0.20 patch. Thanks Suresh for catching this.

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Tsz Wo Nicholas Sze
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development