Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-556

Provide info on failed volumes in the web ui

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HDFS-457 provided better handling of failed volumes but did not provide a corresponding view of this functionality on the web ui, such as a view of which datanodes have failed volumes. This would be a good feature to have.

      1. hdfs-556-1.patch
        2 kB
        Eli Collins
      2. hdfs-556-2.patch
        2 kB
        Eli Collins

        Issue Links

          Activity

          Hide
          Eli Collins added a comment -

          I've committed this.

          Show
          Eli Collins added a comment - I've committed this.
          Hide
          Eli Collins added a comment -

          Thanks Cos. Here's test patch results. I don't think we have a test that covers the jsps. I manually checked this by using the dfsnodelist.jsp page.

               [exec] 
               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no new tests are needed for this patch.
               [exec]                         Also please list what manual steps were performed to verify this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     -1 release audit.  The applied patch generated 103 release audit warnings (more than the trunk's current 1 warnings).
               [exec] 
               [exec]     +1 system test framework.  The patch passed system test framework compile.
               [exec] 
          
          Show
          Eli Collins added a comment - Thanks Cos. Here's test patch results. I don't think we have a test that covers the jsps. I manually checked this by using the dfsnodelist.jsp page. [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 103 release audit warnings (more than the trunk's current 1 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec]
          Hide
          Konstantin Boudnik added a comment -

          +1 patch looks good. I guess there's no way to validate it via test-patch?

          Show
          Konstantin Boudnik added a comment - +1 patch looks good. I guess there's no way to validate it via test-patch?
          Hide
          Eli Collins added a comment -

          Patch attached, same as last one but doesn't overflow 80 cols. Just adds a failed volumes column to the NN jsp page that lists DNs.

          Show
          Eli Collins added a comment - Patch attached, same as last one but doesn't overflow 80 cols. Just adds a failed volumes column to the NN jsp page that lists DNs.
          Hide
          Eli Collins added a comment -

          @Andrew: I added a "volumes failed" metric in HDFS-811 for this reason. This way you can get the data programatically or monitor via eg Ganglia.

          Show
          Eli Collins added a comment - @Andrew: I added a "volumes failed" metric in HDFS-811 for this reason. This way you can get the data programatically or monitor via eg Ganglia.
          Hide
          dhruba borthakur added a comment -

          @andrew: how about if we put this info in the webI as well dfsadmin -report? will that satisfy you?

          Show
          dhruba borthakur added a comment - @andrew: how about if we put this info in the webI as well dfsadmin -report? will that satisfy you?
          Hide
          Andrew Ryan added a comment -

          For sysadmins of clusters of any size, providing this in a web page is interesting but not useful for other scripts to consume. I've already had to write BeautifulSoup parsers to grab other data of interest and it's suboptimal.

          So I would really like to see a way to get this data as some sort of structured text. Perhaps if there was a command to fetch the names of the nodes with failed drives or something.

          Also, does this data appear in dfsadmin -report? That's pretty unstructured, but if it shows up on the UI, it should be there too.

          Show
          Andrew Ryan added a comment - For sysadmins of clusters of any size, providing this in a web page is interesting but not useful for other scripts to consume. I've already had to write BeautifulSoup parsers to grab other data of interest and it's suboptimal. So I would really like to see a way to get this data as some sort of structured text. Perhaps if there was a command to fetch the names of the nodes with failed drives or something. Also, does this data appear in dfsadmin -report? That's pretty unstructured, but if it shows up on the UI, it should be there too.
          Hide
          dhruba borthakur added a comment -

          This will be very helpful for system administrators!

          Show
          dhruba borthakur added a comment - This will be very helpful for system administrators!
          Hide
          Eli Collins added a comment -

          Patch attached. Adds a column in the node list jsp that reports the number of failed volumes per datanode. Uses the interface added in HDFS-811.

          Show
          Eli Collins added a comment - Patch attached. Adds a column in the node list jsp that reports the number of failed volumes per datanode. Uses the interface added in HDFS-811 .
          Hide
          Boris Shkolnik added a comment -

          Please ignore the previous comment. Wrong Jira

          Show
          Boris Shkolnik added a comment - Please ignore the previous comment. Wrong Jira
          Hide
          Boris Shkolnik added a comment -

          Well, there are few ways of solving this.
          1. print out the "cause" exception together with the error message. That will tell the person that the problem was with Quota. We cannot make suggestions about "-skipTrash" in this case.
          2. catch and rethrow exception in delete function when trying to move to Trash. This way we know that the problem is with trash and we can issue a message suggesting using "-skipTrash". The disadvantage is that this way we issue two separate messages.

          Show
          Boris Shkolnik added a comment - Well, there are few ways of solving this. 1. print out the "cause" exception together with the error message. That will tell the person that the problem was with Quota. We cannot make suggestions about "-skipTrash" in this case. 2. catch and rethrow exception in delete function when trying to move to Trash. This way we know that the problem is with trash and we can issue a message suggesting using "-skipTrash". The disadvantage is that this way we issue two separate messages.

            People

            • Assignee:
              Eli Collins
              Reporter:
              Jakob Homan
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development