Hadoop Common
  1. Hadoop Common
  2. HADOOP-5094

Show dead nodes information in dfsadmin -report

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.18.2
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Changed df dfsadmin -report to list live and dead nodes, and attempt to resolve the hostname of datanode ip addresses.

      Description

      As part of operations responsibility to bring back dead nodes, it will be good to have a quick way to obtain a list of dead data nodes.
      The current way is to scrape the namenode web UI page and parse that information, but this creates load on the namenode.
      In search of a less costly way, I noticed dfsadmin -report only reports data nodes with State: "In Service" and "Decommission in progress" get listed.
      Asking for a cheap way to obtain a list of dead nodes.

      In addition, can the following requests be reviewed for additional enhancement and changes to dfsadmin -report.

      • Consistent formatting output in "Remaining raw bytes:" for the data nodes should have a space between the exact value and the parenthesized value.
        Sample:
        Total raw bytes: 3842232975360 (3.49 TB)
        Remaining raw bytes: 146090593065(136.06 GB)
        Used raw bytes: 3240864964620 (2.95 TB)
      • Include the running version of Hadoop.
      • What is the meaning of "Total effective bytes"?
      • Display the hostname instead of the IP address for the data node (toggle option?)
      1. DfsAdminDeadNode_testCases.html
        3 kB
        gary murry
      2. DfsAdminDeadNode_testCases.html
        2 kB
        gary murry
      3. HADOOP-5094.patch
        6 kB
        Jakob Homan
      4. HADOOP-5094.patch
        6 kB
        Jakob Homan
      5. HADOOP-5094.patch
        4 kB
        Jakob Homan

        Issue Links

          Activity

          Hide
          Suresh Srinivas added a comment -

          Output from the command has changed (though the new output still has space before parenthesis) missing. Here is the output with the change from 4281:

          Configured Capacity: 6339239936 (5.9 GB)
          Present Capacity: 3782686528 (3.52 GB)
          DFS Remaining: 2781669184(2.59 GB)
          DFS Used: 1001017344 (954.64 MB)
          DFS Used%: 26.46%

          Show
          Suresh Srinivas added a comment - Output from the command has changed (though the new output still has space before parenthesis) missing. Here is the output with the change from 4281: Configured Capacity: 6339239936 (5.9 GB) Present Capacity: 3782686528 (3.52 GB) DFS Remaining: 2781669184(2.59 GB) DFS Used: 1001017344 (954.64 MB) DFS Used%: 26.46%
          Hide
          Jakob Homan added a comment -

          This patch adds headers to the list of datanodes, separating the living from the dead:

          -------------------------------------------------
          Datanodes available: 9 (10 total, 1 dead)
          
          Live datanodes:
          Name: ipaddr:58301
          Decommission Status : Normal
          Configured Capacity: 974886735872 (907.93 GB)
          DFS Used: 98304 (96 KB)
          Non DFS Used: 163215228928 (152.01 GB)
          DFS Remaining: 811671408640(755.93 GB)
          DFS Used%: 0%
          DFS Remaining%: 83.26%
          Last contact: Wed Jan 28 23:29:32 UTC 2009
          <<snip>>
          
          Dead datanodes:
          Name: ipaddr2:53655
          Decommission Status : Normal
          Configured Capacity: 974886735872 (907.93 GB)
          DFS Used: 98304 (96 KB)
          Non DFS Used: 209286926336 (194.91 GB)
          DFS Remaining: 765599711232(713.02 GB)
          DFS Used%: 0%
          DFS Remaining%: 78.53%
          Last contact: Wed Jan 28 23:17:43 UTC 2009
          

          Also,

          Consistent formatting output in "Remaining raw bytes:"

          fixed.

          Include the running version of Hadoop

          Client version and revision added to output

          What is the meaning of "Total effective bytes"?

          As Suresh noted, no longer included in report output

          Display the hostname instead of the IP address for the data node (toggle option?)

          Would it be worth it have both, if the datanode is specified as an ip addr initially?

          Patch passes all unit tests except known-bad HADOOP-4907. test-patch:

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          No new unit tests because it's just a change to the output of the report and not easily tested.

          Show
          Jakob Homan added a comment - This patch adds headers to the list of datanodes, separating the living from the dead: ------------------------------------------------- Datanodes available: 9 (10 total, 1 dead) Live datanodes: Name: ipaddr:58301 Decommission Status : Normal Configured Capacity: 974886735872 (907.93 GB) DFS Used: 98304 (96 KB) Non DFS Used: 163215228928 (152.01 GB) DFS Remaining: 811671408640(755.93 GB) DFS Used%: 0% DFS Remaining%: 83.26% Last contact: Wed Jan 28 23:29:32 UTC 2009 <<snip>> Dead datanodes: Name: ipaddr2:53655 Decommission Status : Normal Configured Capacity: 974886735872 (907.93 GB) DFS Used: 98304 (96 KB) Non DFS Used: 209286926336 (194.91 GB) DFS Remaining: 765599711232(713.02 GB) DFS Used%: 0% DFS Remaining%: 78.53% Last contact: Wed Jan 28 23:17:43 UTC 2009 Also, Consistent formatting output in "Remaining raw bytes:" fixed. Include the running version of Hadoop Client version and revision added to output What is the meaning of "Total effective bytes"? As Suresh noted, no longer included in report output Display the hostname instead of the IP address for the data node (toggle option?) Would it be worth it have both, if the datanode is specified as an ip addr initially? Patch passes all unit tests except known-bad HADOOP-4907 . test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. No new unit tests because it's just a change to the output of the report and not easily tested.
          Hide
          Raghu Angadi added a comment -

          > Would it be worth it have both, if the datanode is specified as an ip addr initially?

          I think it should be both.

          Show
          Raghu Angadi added a comment - > Would it be worth it have both, if the datanode is specified as an ip addr initially? I think it should be both.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          >> Include the running version of Hadoop
          >
          > Client version and revision added to output

          Should we print the server version instead of client version? The client version can be obtained by "./bin/hadoop version".

          Show
          Tsz Wo Nicholas Sze added a comment - >> Include the running version of Hadoop > > Client version and revision added to output Should we print the server version instead of client version? The client version can be obtained by "./bin/hadoop version".
          Hide
          Jim Huang added a comment -

          Please do provide the server version, so there is a quick and non-taxing way of determine what is the current running version on the namenode.

          Show
          Jim Huang added a comment - Please do provide the server version, so there is a quick and non-taxing way of determine what is the current running version on the namenode.
          Hide
          Jakob Homan added a comment -

          At the moment there is no way to query the server version via the client protocol, and that ability seems beyond the scope of this JIRA, so that feature should probably wait. HADOOP-4368 is currently dealing with the information included in this report and it may be worthwhile to include the server version in that work. I'll open another JIRA for this.

          Show
          Jakob Homan added a comment - At the moment there is no way to query the server version via the client protocol, and that ability seems beyond the scope of this JIRA, so that feature should probably wait. HADOOP-4368 is currently dealing with the information included in this report and it may be worthwhile to include the server version in that work. I'll open another JIRA for this.
          Hide
          Jakob Homan added a comment -

          Updated patch to add code to include the hostname in parens after ip addr, if it can be determined. If not, nothing is printed.
          Good on unit tests except the usual suspect. Again, just a change in output, so no new unit test:

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
          Show
          Jakob Homan added a comment - Updated patch to add code to include the hostname in parens after ip addr, if it can be determined. If not, nothing is printed. Good on unit tests except the usual suspect. Again, just a change in output, so no new unit test: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Jakob Homan added a comment -

          Submitting updated patch.

          Show
          Jakob Homan added a comment - Submitting updated patch.
          Hide
          Jakob Homan added a comment -

          Jim noticed that this issue addresses HADOOP-2937, if you consider the last contact time to be the time the node went dead. It's not a GUI fix, but it should suffice.

          Show
          Jakob Homan added a comment - Jim noticed that this issue addresses HADOOP-2937 , if you consider the last contact time to be the time the node went dead. It's not a GUI fix, but it should suffice.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12399197/HADOOP-5094.patch
          against trunk revision 739416.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12399197/HADOOP-5094.patch against trunk revision 739416. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/console This message is automatically generated.
          Hide
          Jakob Homan added a comment -

          The failing contrib tests are the known-bad Chukwa tests.

          Show
          Jakob Homan added a comment - The failing contrib tests are the known-bad Chukwa tests.
          Hide
          Jakob Homan added a comment -

          I created HADOOP-5159 to deal with getting the server version displayed in the report.

          Show
          Jakob Homan added a comment - I created HADOOP-5159 to deal with getting the server version displayed in the report.
          Hide
          Jakob Homan added a comment -

          The patch went stale with the committing of HADOOP-4368. Nothing of substance, just some jostling over imports. Uploading a new version that applies against trunk.

          Show
          Jakob Homan added a comment - The patch went stale with the committing of HADOOP-4368 . Nothing of substance, just some jostling over imports. Uploading a new version that applies against trunk.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I just committed this. Thanks, Jakob!

          Show
          Tsz Wo Nicholas Sze added a comment - I just committed this. Thanks, Jakob!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Jakob, please add release note since this is an incompatible change.

          Show
          Tsz Wo Nicholas Sze added a comment - Jakob, please add release note since this is an incompatible change.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )
          Hide
          gary murry added a comment -

          Just listing some test case that should be covered to test this improvement.

          Show
          gary murry added a comment - Just listing some test case that should be covered to test this improvement.
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.
          Hide
          gary murry added a comment -

          An updates list of testcases after getting some feedback.

          Show
          gary murry added a comment - An updates list of testcases after getting some feedback.
          Hide
          Ravi Prakash added a comment -

          Hi Gary,

          I've only just joined the hadoop team and am going to be writing automated tests for the test cases you've listed. Please excuse my naivete if I'm way off. Can you please clarify what you mean by "stop a node" . Is that using the hadoop-daemon.sh script? Won't the the namenode mark the datanode as dead only after 10 mins? Should my test be that long-lived?

          Cheers
          Ravi.

          Show
          Ravi Prakash added a comment - Hi Gary, I've only just joined the hadoop team and am going to be writing automated tests for the test cases you've listed. Please excuse my naivete if I'm way off. Can you please clarify what you mean by "stop a node" . Is that using the hadoop-daemon.sh script? Won't the the namenode mark the datanode as dead only after 10 mins? Should my test be that long-lived? Cheers Ravi.
          Hide
          Ravi Prakash added a comment -

          I'm not sure what the expected behavior is when a node is specified in dfs.exclude and the cluster is started. Maybe this should never be done. But if it IS done. dfsadmin -report shows this

          Live datanodes:
          ....
          ....
          ....

          Dead datanodes:
          report: String index out of range: -1

          Is this fine?

          The output is as expected when the cluster is started without any dfs.exclude entries, and then one added. (it shows it as a dead node with Decommission Status: Decommissioned) So that is good.

          Show
          Ravi Prakash added a comment - I'm not sure what the expected behavior is when a node is specified in dfs.exclude and the cluster is started. Maybe this should never be done. But if it IS done. dfsadmin -report shows this Live datanodes: .... .... .... Dead datanodes: report: String index out of range: -1 Is this fine? The output is as expected when the cluster is started without any dfs.exclude entries, and then one added. (it shows it as a dead node with Decommission Status: Decommissioned) So that is good.

            People

            • Assignee:
              Jakob Homan
              Reporter:
              Jim Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development