Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1873

Federation Cluster Management Web Console

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: None
    • Labels:
      None

      Description

      The Federation cluster management console provides

      1. Cluster summary information that shows overall cluster utilization. A list of the name nodes that reports the used space, files and directories, blocks, live and dead datanodes
        of each name space.
      2. decommissioning status of all the data nodes who are decommissioning in process or decommissioned.
      1. ClusterSummary.png
        55 kB
        Tanping Wang
      2. Decommission.png
        39 kB
        Tanping Wang
      3. HDFS-1873.2.patch
        50 kB
        Tanping Wang
      4. HDFS-1873.3.patch
        52 kB
        Tanping Wang
      5. HDFS-1873.4.patch
        54 kB
        Tanping Wang
      6. HDFS-1873.patch
        50 kB
        Tanping Wang

        Issue Links

          Activity

          Hide
          Tanping Wang added a comment -

          FederationClusterSummary page screen shot.

          Show
          Tanping Wang added a comment - FederationClusterSummary page screen shot.
          Hide
          Tanping Wang added a comment -

          Federation cluster decommission page screen shot.

          Show
          Tanping Wang added a comment - Federation cluster decommission page screen shot.
          Hide
          Tanping Wang added a comment -

          The web console talks to each name node through jmx and collect statistics. It can sit on any third party box to monitor cluster status. Connecting to any of the name nodes, the same content is returned. A MXBean sits on every name node to expose statistics through JMX. No communication between name nodes are required.
          In case of one or more name node is not connected. They are listed under unreported name nodes with their exceptions encountered. If none of the name nodes are available, exceptions encountered are also listed on the web page.

          Cluster decommission page monitors decommission status of data nodes in the cluster. Decommission status of a data node is decommission in progress, or decommissioned. An over-all decommission status is provided per name space by consolidating all the data nodes status in the same name space. If a data node is in service, it is not included in the decommission report page.

          Show
          Tanping Wang added a comment - The web console talks to each name node through jmx and collect statistics. It can sit on any third party box to monitor cluster status. Connecting to any of the name nodes, the same content is returned. A MXBean sits on every name node to expose statistics through JMX. No communication between name nodes are required. In case of one or more name node is not connected. They are listed under unreported name nodes with their exceptions encountered. If none of the name nodes are available, exceptions encountered are also listed on the web page. Cluster decommission page monitors decommission status of data nodes in the cluster. Decommission status of a data node is decommission in progress, or decommissioned. An over-all decommission status is provided per name space by consolidating all the data nodes status in the same name space. If a data node is in service, it is not included in the decommission report page.
          Hide
          Philip Zeyliger added a comment -

          Hi,

          Absolutely excellent to have a web UI here; thanks for making it!

          A few comments on the JMX approach:

          (1) The namenode already has a port open for the Namenode RPC protocol and another one open for HTTP. Have you considered adding a getWhatever() API to the namenode RPC protocol or adding a Jetty servlet endpoint for the same? Either approach would require less configuration. We already have two perfectly good protocols; why bother with a third? (

          (2) You provide a way to set the agent port, but, ultimately, there are a lot of properties involved in setting up JMX (http://download.oracle.com/javase/6/docs/technotes/guides/management/agent.html has ones for security, SSL, passwords and credentials, etc.) As you know, once you connect via JMX, you can do pretty much anything, including injecting code into the namenode. It seems that setting '-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false' (as you do in one of the patches in the subissues) by default is unwise from a security perspective.

          (3) The method you use to get the config from hdfs-site to the shell script ends up creating an extra JVM. That's not terrible: it's not that Hadoop startup time is particularly speedy to begin with, but we already have options in hadoop-env.sh (like heapsize), so the extra mechanism feels like it may be gratuitous. I looked around to see if there was a way to start (or open) the JMX agent port programatically, but I did not find one.

          Cheers,

          – Philip

          Show
          Philip Zeyliger added a comment - Hi, Absolutely excellent to have a web UI here; thanks for making it! A few comments on the JMX approach: (1) The namenode already has a port open for the Namenode RPC protocol and another one open for HTTP. Have you considered adding a getWhatever() API to the namenode RPC protocol or adding a Jetty servlet endpoint for the same? Either approach would require less configuration. We already have two perfectly good protocols; why bother with a third? ( (2) You provide a way to set the agent port, but, ultimately, there are a lot of properties involved in setting up JMX ( http://download.oracle.com/javase/6/docs/technotes/guides/management/agent.html has ones for security, SSL, passwords and credentials, etc.) As you know, once you connect via JMX, you can do pretty much anything, including injecting code into the namenode. It seems that setting '-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false' (as you do in one of the patches in the subissues) by default is unwise from a security perspective. (3) The method you use to get the config from hdfs-site to the shell script ends up creating an extra JVM. That's not terrible: it's not that Hadoop startup time is particularly speedy to begin with, but we already have options in hadoop-env.sh (like heapsize), so the extra mechanism feels like it may be gratuitous. I looked around to see if there was a way to start (or open) the JMX agent port programatically, but I did not find one. Cheers, – Philip
          Hide
          Tanping Wang added a comment -

          Philip,
          Thank you very much for your comments!
          Regarding to (1), we chose to use JMX, but not others, the reason can be referred back to HDFS-1318. Started in HDFS-1318, we exposed the name node and data node statistics using JMX (the same information is exposed on JSP pages over HTTP), so that we can provide structured information to help building and scripting around the exposed statistics. The new Federation web management UI are built based on JMX, its content is in the format of XML. Beyond viewing the cluster management console web page, one can choose to write a parser to parse the XML or directly query name nodes or data nodes through JMX to get structured statistics back in order to monitor the cluster.
          Regarding to (2), your concern is valid! We are working on how to address the security related details now. The initial idea is to read the information through JMX with no writing.
          Regarding to (3), I wanted to make the JMX port number configurable in hdfs-site.xml along with other properties. I was not able to find a better way to read the property and set it into hadoop-env.sh. The other way that I thought of was to set the jmx port system property in a configuration file. But I felt that having an extra configuration file is a bad idea.

          Show
          Tanping Wang added a comment - Philip, Thank you very much for your comments! Regarding to (1), we chose to use JMX, but not others, the reason can be referred back to HDFS-1318 . Started in HDFS-1318 , we exposed the name node and data node statistics using JMX (the same information is exposed on JSP pages over HTTP), so that we can provide structured information to help building and scripting around the exposed statistics. The new Federation web management UI are built based on JMX, its content is in the format of XML. Beyond viewing the cluster management console web page, one can choose to write a parser to parse the XML or directly query name nodes or data nodes through JMX to get structured statistics back in order to monitor the cluster. Regarding to (2), your concern is valid! We are working on how to address the security related details now. The initial idea is to read the information through JMX with no writing. Regarding to (3), I wanted to make the JMX port number configurable in hdfs-site.xml along with other properties. I was not able to find a better way to read the property and set it into hadoop-env.sh. The other way that I thought of was to set the jmx port system property in a configuration file. But I felt that having an extra configuration file is a bad idea.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12478103/HDFS-1873.patch
          against trunk revision 1099285.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/448//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478103/HDFS-1873.patch against trunk revision 1099285. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/448//console This message is automatically generated.
          Hide
          Tanping Wang added a comment -

          Upload a new patch to accommodate the newest trunk change.

          Show
          Tanping Wang added a comment - Upload a new patch to accommodate the newest trunk change.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12478108/HDFS-1873.2.patch
          against trunk revision 1099285.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 32 javac compiler warnings (more than the trunk's current 25 warnings).

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 6 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.namenode.TestBackupNode
          org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/449//testReport/
          Release audit warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/449//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/449//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/449//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478108/HDFS-1873.2.patch against trunk revision 1099285. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 32 javac compiler warnings (more than the trunk's current 25 warnings). -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 6 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/449//testReport/ Release audit warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/449//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/449//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/449//console This message is automatically generated.
          Hide
          Tanping Wang added a comment -

          Upload a patch that fixed warnings of running test-patch.

          Run test-patch manually against the new patch:

          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
          [exec] Please justify why no new tests are needed for this patch.
          [exec] Also please list what manual steps were performed to verify this patch.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          [exec]
          [exec] +1 system test framework. The patch passed system test framework compile.
          [exec]
          [exec]
          [exec]
          [exec]
          [exec] ======================================================================
          [exec] ======================================================================
          [exec] Finished build.
          [exec] ======================================================================
          [exec] ======================================================================

          Show
          Tanping Wang added a comment - Upload a patch that fixed warnings of running test-patch. Run test-patch manually against the new patch: [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ======================================================================
          Hide
          Tanping Wang added a comment -

          There is no unit test cases included in the patch because this is web UI. Tested manually. Basic tests included(, but not limited to):

          1. cluster management console:
            case 1)
            start up Federation cluster with jmx enabled on name nodes;
            copy some files to the cluster;
            check cluster statistics;
            case 2)
            start up Federation cluster with one or two name nodes down;
            copy some files to the cluster;
            the name nodes who are down appear in unreported name nodes section;
            case 3)
            start up Federation cluster with jmx disabled on the server side;
            exception should be captured on web UI.
          1. decommission progress page
            case 1)
            start up cluster;
            add couple data nodes into excludes list;
            refresh cluster on all names nodes;
            check decommission progress page reflect the decommission progress
            case 2)
            start up cluster;
            add couple data nodes into excludes list;
            do dfsadmin refresh on one of the name nodes (but not all);
            check decommission progress page, only data nodes in one name space should be under decommission.
            case 3)
            start up cluster;
            kill data node progress on some of the data nodes;
            decommission some of the data nodes;
            check decommission progress page.
          Show
          Tanping Wang added a comment - There is no unit test cases included in the patch because this is web UI. Tested manually. Basic tests included(, but not limited to): cluster management console: case 1) start up Federation cluster with jmx enabled on name nodes; copy some files to the cluster; check cluster statistics; case 2) start up Federation cluster with one or two name nodes down; copy some files to the cluster; the name nodes who are down appear in unreported name nodes section; case 3) start up Federation cluster with jmx disabled on the server side; exception should be captured on web UI. decommission progress page case 1) start up cluster; add couple data nodes into excludes list; refresh cluster on all names nodes; check decommission progress page reflect the decommission progress case 2) start up cluster; add couple data nodes into excludes list; do dfsadmin refresh on one of the name nodes (but not all); check decommission progress page, only data nodes in one name space should be under decommission. case 3) start up cluster; kill data node progress on some of the data nodes; decommission some of the data nodes; check decommission progress page.
          Hide
          Tanping Wang added a comment -

          Get rid of tabs in some of the xsl files.

          Show
          Tanping Wang added a comment - Get rid of tabs in some of the xsl files.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12478655/HDFS-1873.4.patch
          against trunk revision 1101137.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestDFSShell
          org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/470//testReport/
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/470//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/470//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478655/HDFS-1873.4.patch against trunk revision 1101137. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/470//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/470//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/470//console This message is automatically generated.
          Hide
          Tanping Wang added a comment -

          The three failed test cases are known and irrelevant to this patch

          Show
          Tanping Wang added a comment - The three failed test cases are known and irrelevant to this patch
          Hide
          Suresh Srinivas added a comment -

          I committed the patch. Thank you Tanping.

          Show
          Suresh Srinivas added a comment - I committed the patch. Thank you Tanping.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #637 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/637/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #637 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/637/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #673 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/673/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #673 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/673/ )
          Hide
          Tanping Wang added a comment -

          Jmx attributes are now exposed to the client over http connection via JMXJsonServelet on the namenode side.

          Show
          Tanping Wang added a comment - Jmx attributes are now exposed to the client over http connection via JMXJsonServelet on the namenode side.

            People

            • Assignee:
              Tanping Wang
              Reporter:
              Tanping Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development