Hadoop Common
  1. Hadoop Common
  2. HADOOP-5094

Show dead nodes information in dfsadmin -report

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.18.2
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Changed df dfsadmin -report to list live and dead nodes, and attempt to resolve the hostname of datanode ip addresses.

      Description

      As part of operations responsibility to bring back dead nodes, it will be good to have a quick way to obtain a list of dead data nodes.
      The current way is to scrape the namenode web UI page and parse that information, but this creates load on the namenode.
      In search of a less costly way, I noticed dfsadmin -report only reports data nodes with State: "In Service" and "Decommission in progress" get listed.
      Asking for a cheap way to obtain a list of dead nodes.

      In addition, can the following requests be reviewed for additional enhancement and changes to dfsadmin -report.

      • Consistent formatting output in "Remaining raw bytes:" for the data nodes should have a space between the exact value and the parenthesized value.
        Sample:
        Total raw bytes: 3842232975360 (3.49 TB)
        Remaining raw bytes: 146090593065(136.06 GB)
        Used raw bytes: 3240864964620 (2.95 TB)
      • Include the running version of Hadoop.
      • What is the meaning of "Total effective bytes"?
      • Display the hostname instead of the IP address for the data node (toggle option?)
      1. HADOOP-5094.patch
        4 kB
        Jakob Homan
      2. HADOOP-5094.patch
        6 kB
        Jakob Homan
      3. HADOOP-5094.patch
        6 kB
        Jakob Homan
      4. DfsAdminDeadNode_testCases.html
        2 kB
        gary murry
      5. DfsAdminDeadNode_testCases.html
        3 kB
        gary murry

        Issue Links

          Activity

          Allen Wittenauer made changes -
          Link This issue duplicates HDFS-363 [ HDFS-363 ]
          Hide
          Ravi Prakash added a comment -

          I'm not sure what the expected behavior is when a node is specified in dfs.exclude and the cluster is started. Maybe this should never be done. But if it IS done. dfsadmin -report shows this

          Live datanodes:
          ....
          ....
          ....

          Dead datanodes:
          report: String index out of range: -1

          Is this fine?

          The output is as expected when the cluster is started without any dfs.exclude entries, and then one added. (it shows it as a dead node with Decommission Status: Decommissioned) So that is good.

          Show
          Ravi Prakash added a comment - I'm not sure what the expected behavior is when a node is specified in dfs.exclude and the cluster is started. Maybe this should never be done. But if it IS done. dfsadmin -report shows this Live datanodes: .... .... .... Dead datanodes: report: String index out of range: -1 Is this fine? The output is as expected when the cluster is started without any dfs.exclude entries, and then one added. (it shows it as a dead node with Decommission Status: Decommissioned) So that is good.
          Hide
          Ravi Prakash added a comment -

          Hi Gary,

          I've only just joined the hadoop team and am going to be writing automated tests for the test cases you've listed. Please excuse my naivete if I'm way off. Can you please clarify what you mean by "stop a node" . Is that using the hadoop-daemon.sh script? Won't the the namenode mark the datanode as dead only after 10 mins? Should my test be that long-lived?

          Cheers
          Ravi.

          Show
          Ravi Prakash added a comment - Hi Gary, I've only just joined the hadoop team and am going to be writing automated tests for the test cases you've listed. Please excuse my naivete if I'm way off. Can you please clarify what you mean by "stop a node" . Is that using the hadoop-daemon.sh script? Won't the the namenode mark the datanode as dead only after 10 mins? Should my test be that long-lived? Cheers Ravi.
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          gary murry made changes -
          Attachment DfsAdminDeadNode_testCases.html [ 12424082 ]
          Hide
          gary murry added a comment -

          An updates list of testcases after getting some feedback.

          Show
          gary murry added a comment - An updates list of testcases after getting some feedback.
          Robert Chansler made changes -
          Release Note Update the output of dfsadmin -report to delineate the live and dead nodes, as well as attempt to resolve the hostname of datanode ip addresses. Minor formatting changes. Changed df dfsadmin -report to list live and dead nodes, and attempt to resolve the hostname of datanode ip addresses.
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.
          gary murry made changes -
          Attachment DfsAdminDeadNode_testCases.html [ 12420710 ]
          Hide
          gary murry added a comment -

          Just listing some test case that should be covered to test this improvement.

          Show
          gary murry added a comment - Just listing some test case that should be covered to test this improvement.
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )
          Jakob Homan made changes -
          Release Note Update the output of dfsadmin -report to delineate the live and dead nodes, as well as attempt to resolve the hostname of datanode ip addresses. Minor formatting changes.
          Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Jakob, please add release note since this is an incompatible change.

          Show
          Tsz Wo Nicholas Sze added a comment - Jakob, please add release note since this is an incompatible change.
          Tsz Wo Nicholas Sze made changes -
          Resolution Fixed [ 1 ]
          Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I just committed this. Thanks, Jakob!

          Show
          Tsz Wo Nicholas Sze added a comment - I just committed this. Thanks, Jakob!
          Tsz Wo Nicholas Sze made changes -
          Issue Type New Feature [ 2 ] Improvement [ 4 ]
          Hadoop Flags [Incompatible change] [Incompatible change, Reviewed]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Jakob Homan made changes -
          Attachment HADOOP-5094.patch [ 12399407 ]
          Hide
          Jakob Homan added a comment -

          The patch went stale with the committing of HADOOP-4368. Nothing of substance, just some jostling over imports. Uploading a new version that applies against trunk.

          Show
          Jakob Homan added a comment - The patch went stale with the committing of HADOOP-4368 . Nothing of substance, just some jostling over imports. Uploading a new version that applies against trunk.
          Hide
          Jakob Homan added a comment -

          I created HADOOP-5159 to deal with getting the server version displayed in the report.

          Show
          Jakob Homan added a comment - I created HADOOP-5159 to deal with getting the server version displayed in the report.
          Hide
          Jakob Homan added a comment -

          The failing contrib tests are the known-bad Chukwa tests.

          Show
          Jakob Homan added a comment - The failing contrib tests are the known-bad Chukwa tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12399197/HADOOP-5094.patch
          against trunk revision 739416.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12399197/HADOOP-5094.patch against trunk revision 739416. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3778/console This message is automatically generated.
          Hide
          Jakob Homan added a comment -

          Jim noticed that this issue addresses HADOOP-2937, if you consider the last contact time to be the time the node went dead. It's not a GUI fix, but it should suffice.

          Show
          Jakob Homan added a comment - Jim noticed that this issue addresses HADOOP-2937 , if you consider the last contact time to be the time the node went dead. It's not a GUI fix, but it should suffice.
          Jakob Homan made changes -
          Link This issue incorporates HADOOP-2937 [ HADOOP-2937 ]
          Jakob Homan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Jakob Homan added a comment -

          Submitting updated patch.

          Show
          Jakob Homan added a comment - Submitting updated patch.
          Jakob Homan made changes -
          Attachment HADOOP-5094.patch [ 12399197 ]
          Hide
          Jakob Homan added a comment -

          Updated patch to add code to include the hostname in parens after ip addr, if it can be determined. If not, nothing is printed.
          Good on unit tests except the usual suspect. Again, just a change in output, so no new unit test:

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
          Show
          Jakob Homan added a comment - Updated patch to add code to include the hostname in parens after ip addr, if it can be determined. If not, nothing is printed. Good on unit tests except the usual suspect. Again, just a change in output, so no new unit test: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Jakob Homan added a comment -

          At the moment there is no way to query the server version via the client protocol, and that ability seems beyond the scope of this JIRA, so that feature should probably wait. HADOOP-4368 is currently dealing with the information included in this report and it may be worthwhile to include the server version in that work. I'll open another JIRA for this.

          Show
          Jakob Homan added a comment - At the moment there is no way to query the server version via the client protocol, and that ability seems beyond the scope of this JIRA, so that feature should probably wait. HADOOP-4368 is currently dealing with the information included in this report and it may be worthwhile to include the server version in that work. I'll open another JIRA for this.
          Jakob Homan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Jim Huang added a comment -

          Please do provide the server version, so there is a quick and non-taxing way of determine what is the current running version on the namenode.

          Show
          Jim Huang added a comment - Please do provide the server version, so there is a quick and non-taxing way of determine what is the current running version on the namenode.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          >> Include the running version of Hadoop
          >
          > Client version and revision added to output

          Should we print the server version instead of client version? The client version can be obtained by "./bin/hadoop version".

          Show
          Tsz Wo Nicholas Sze added a comment - >> Include the running version of Hadoop > > Client version and revision added to output Should we print the server version instead of client version? The client version can be obtained by "./bin/hadoop version".
          Hide
          Raghu Angadi added a comment -

          > Would it be worth it have both, if the datanode is specified as an ip addr initially?

          I think it should be both.

          Show
          Raghu Angadi added a comment - > Would it be worth it have both, if the datanode is specified as an ip addr initially? I think it should be both.
          Jakob Homan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Jakob Homan made changes -
          Hadoop Flags [Incompatible change]
          Description
          As part of operations responsibility to bring back dead nodes, it will be good to have a quick way to obtain a list of dead data nodes.
          The current way is to scrape the namenode web UI page and parse that information, but this creates load on the namenode.
          In search of a less costly way, I noticed dfsadmin -report only reports data nodes with State: "In Service" and "Decommission in progress" get listed.
          Asking for a cheap way to obtain a list of dead nodes.

          In addition, can the following requests be reviewed for additional enhancement and changes to dfsadmin -report.

          - Consistent formatting output in "Remaining raw bytes:" for the data nodes should have a space between the exact value and the parenthesized value.
          Sample:
          Total raw bytes: 3842232975360 (3.49 TB)
          Remaining raw bytes: 146090593065(136.06 GB)
          Used raw bytes: 3240864964620 (2.95 TB)

          - Include the running version of Hadoop.

          - What is the meaning of "Total effective bytes"?

          - Display the hostname instead of the IP address for the data node (toggle option?)
          As part of operations responsibility to bring back dead nodes, it will be good to have a quick way to obtain a list of dead data nodes.
          The current way is to scrape the namenode web UI page and parse that information, but this creates load on the namenode.
          In search of a less costly way, I noticed dfsadmin -report only reports data nodes with State: "In Service" and "Decommission in progress" get listed.
          Asking for a cheap way to obtain a list of dead nodes.

          In addition, can the following requests be reviewed for additional enhancement and changes to dfsadmin -report.

          - Consistent formatting output in "Remaining raw bytes:" for the data nodes should have a space between the exact value and the parenthesized value.
          Sample:
          Total raw bytes: 3842232975360 (3.49 TB)
          Remaining raw bytes: 146090593065(136.06 GB)
          Used raw bytes: 3240864964620 (2.95 TB)

          - Include the running version of Hadoop.

          - What is the meaning of "Total effective bytes"?

          - Display the hostname instead of the IP address for the data node (toggle option?)
          Jakob Homan made changes -
          Attachment HADOOP-5094.patch [ 12399035 ]
          Hide
          Jakob Homan added a comment -

          This patch adds headers to the list of datanodes, separating the living from the dead:

          -------------------------------------------------
          Datanodes available: 9 (10 total, 1 dead)
          
          Live datanodes:
          Name: ipaddr:58301
          Decommission Status : Normal
          Configured Capacity: 974886735872 (907.93 GB)
          DFS Used: 98304 (96 KB)
          Non DFS Used: 163215228928 (152.01 GB)
          DFS Remaining: 811671408640(755.93 GB)
          DFS Used%: 0%
          DFS Remaining%: 83.26%
          Last contact: Wed Jan 28 23:29:32 UTC 2009
          <<snip>>
          
          Dead datanodes:
          Name: ipaddr2:53655
          Decommission Status : Normal
          Configured Capacity: 974886735872 (907.93 GB)
          DFS Used: 98304 (96 KB)
          Non DFS Used: 209286926336 (194.91 GB)
          DFS Remaining: 765599711232(713.02 GB)
          DFS Used%: 0%
          DFS Remaining%: 78.53%
          Last contact: Wed Jan 28 23:17:43 UTC 2009
          

          Also,

          Consistent formatting output in "Remaining raw bytes:"

          fixed.

          Include the running version of Hadoop

          Client version and revision added to output

          What is the meaning of "Total effective bytes"?

          As Suresh noted, no longer included in report output

          Display the hostname instead of the IP address for the data node (toggle option?)

          Would it be worth it have both, if the datanode is specified as an ip addr initially?

          Patch passes all unit tests except known-bad HADOOP-4907. test-patch:

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          No new unit tests because it's just a change to the output of the report and not easily tested.

          Show
          Jakob Homan added a comment - This patch adds headers to the list of datanodes, separating the living from the dead: ------------------------------------------------- Datanodes available: 9 (10 total, 1 dead) Live datanodes: Name: ipaddr:58301 Decommission Status : Normal Configured Capacity: 974886735872 (907.93 GB) DFS Used: 98304 (96 KB) Non DFS Used: 163215228928 (152.01 GB) DFS Remaining: 811671408640(755.93 GB) DFS Used%: 0% DFS Remaining%: 83.26% Last contact: Wed Jan 28 23:29:32 UTC 2009 <<snip>> Dead datanodes: Name: ipaddr2:53655 Decommission Status : Normal Configured Capacity: 974886735872 (907.93 GB) DFS Used: 98304 (96 KB) Non DFS Used: 209286926336 (194.91 GB) DFS Remaining: 765599711232(713.02 GB) DFS Used%: 0% DFS Remaining%: 78.53% Last contact: Wed Jan 28 23:17:43 UTC 2009 Also, Consistent formatting output in "Remaining raw bytes:" fixed. Include the running version of Hadoop Client version and revision added to output What is the meaning of "Total effective bytes"? As Suresh noted, no longer included in report output Display the hostname instead of the IP address for the data node (toggle option?) Would it be worth it have both, if the datanode is specified as an ip addr initially? Patch passes all unit tests except known-bad HADOOP-4907 . test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. No new unit tests because it's just a change to the output of the report and not easily tested.
          Jakob Homan made changes -
          Link This issue is related to HADOOP-4281 [ HADOOP-4281 ]
          Hide
          Suresh Srinivas added a comment -

          Output from the command has changed (though the new output still has space before parenthesis) missing. Here is the output with the change from 4281:

          Configured Capacity: 6339239936 (5.9 GB)
          Present Capacity: 3782686528 (3.52 GB)
          DFS Remaining: 2781669184(2.59 GB)
          DFS Used: 1001017344 (954.64 MB)
          DFS Used%: 26.46%

          Show
          Suresh Srinivas added a comment - Output from the command has changed (though the new output still has space before parenthesis) missing. Here is the output with the change from 4281: Configured Capacity: 6339239936 (5.9 GB) Present Capacity: 3782686528 (3.52 GB) DFS Remaining: 2781669184(2.59 GB) DFS Used: 1001017344 (954.64 MB) DFS Used%: 26.46%
          Jakob Homan made changes -
          Field Original Value New Value
          Assignee Jakob Homan [ jghoman ]
          Jim Huang created issue -

            People

            • Assignee:
              Jakob Homan
              Reporter:
              Jim Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development