Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1954

Improve corrupt files warning message on NameNode web UI

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      On NameNode web interface, you may get this warning:

      WARNING : There are about 32 missing blocks. Please check the log or run fsck.

      If the cluster was started less than 14 days before, it would be great to add: "Is dfs.data.dir defined?"

      If at the point of that error message, that parameter could be checked, and error made "OMG dfs.data.dir isn't defined!" that'd be even better. As is, troubleshooting undefined parameters is a difficult proposition.

      I suspect this is an easy fix.

      1. branch-0.22-hdfs-1954.patch
        3 kB
        Konstantin Shvachko
      2. HDFS-1954.patch
        2 kB
        Patrick Hunt
      3. HDFS-1954.patch
        2 kB
        Patrick Hunt
      4. HDFS-1954.patch
        2 kB
        Patrick Hunt

        Activity

        Hide
        Patrick Hunt added a comment -

        This patch adds some detail to the error message, we can't check the setting as that's local to the datanode, and not available on this screen/node.

        It now says something like (with the hint class=small):

        WARNING : There are 5 missing blocks. Please check the log or run fsck.
        Hint: A common mis-configuration is not overriding "dfs.datanode.data.dir" on all datanodes(the default is typically /tmp which is not persistent)

        Show
        Patrick Hunt added a comment - This patch adds some detail to the error message, we can't check the setting as that's local to the datanode, and not available on this screen/node. It now says something like (with the hint class=small): WARNING : There are 5 missing blocks. Please check the log or run fsck. Hint: A common mis-configuration is not overriding "dfs.datanode.data.dir" on all datanodes(the default is typically /tmp which is not persistent)
        Hide
        Suresh Srinivas added a comment -

        I am not sure adding all the reasons why the warning is printed is a good idea.

        Blocks might be missing for any number of reasons - datanodes might be down, one might be using replication factor as 1 etc. Should we add all this information to the web page as well?

        I think this is a good candidate for a FAQ.

        Show
        Suresh Srinivas added a comment - I am not sure adding all the reasons why the warning is printed is a good idea. Blocks might be missing for any number of reasons - datanodes might be down, one might be using replication factor as 1 etc. Should we add all this information to the web page as well? I think this is a good candidate for a FAQ.
        Hide
        philo vivero added a comment -

        Don't let perfect destroy good. Warning already suggests to check logs and run fsck, yet we know anyone who sets up HDFS with defaults will get this corruption eventually. It's reasonable to have an error message suggest the most common cause of the error, if the error cannot be eliminated in the first place.

        If you must eliminate suggestions, then make the error message Very Google Searchable, and let people know the common causes of the error on the resulting page (it appears the FAQ is the page you're suggesting here).

        Perhaps: "WARNING: There are missing blocks in your filesystem. Please locate the HDFS FAQ to help determine causes of this problem. Number of missing blocks: #####."

        Then put that exact verbiage on the FAQ, too, so people will find it when searching. Note that I've put the variable part of the error at the very end.

        Show
        philo vivero added a comment - Don't let perfect destroy good. Warning already suggests to check logs and run fsck, yet we know anyone who sets up HDFS with defaults will get this corruption eventually. It's reasonable to have an error message suggest the most common cause of the error, if the error cannot be eliminated in the first place. If you must eliminate suggestions, then make the error message Very Google Searchable, and let people know the common causes of the error on the resulting page (it appears the FAQ is the page you're suggesting here). Perhaps: "WARNING: There are missing blocks in your filesystem. Please locate the HDFS FAQ to help determine causes of this problem. Number of missing blocks: #####." Then put that exact verbiage on the FAQ, too, so people will find it when searching. Note that I've put the variable part of the error at the very end.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479678/HDFS-1954.patch
        against trunk revision 1124364.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
        org.apache.hadoop.hdfs.TestFileConcurrentReader
        org.apache.hadoop.tools.TestJMXGet

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/568//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/568//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/568//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479678/HDFS-1954.patch against trunk revision 1124364. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/568//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/568//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/568//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        I agree this is better to link towards a FAQ, or even better, to a new page in the documentation. (the wiki is not good for users whose deployments are isolated from the internet)

        Show
        Todd Lipcon added a comment - I agree this is better to link towards a FAQ, or even better, to a new page in the documentation. (the wiki is not good for users whose deployments are isolated from the internet)
        Hide
        Patrick Hunt added a comment -

        Thanks for the feedback guys. I agree what you are suggesting is better than what I came up with. However I did think about such solutions and ended up deciding not to make such changes for this specific jira, a summary why:

        1) Originally I nearly implemented what Suresh suggests, which is a great idea, but decided against it because of the reason Todd mentioned (offline deployments), the fact that no such faq already existed (I checked both the faq and the troubleshooting guide on the wiki), and finally because afaict none of the existing screens are linking out to such docs. I didn't want to introduce new functionality here, just address the jira.

        2) newbie users are the most likely to make this mistake. If you're more experience with hadoop you're also more likely to know what other issues could have caused this, or at least where to look. So having a big long list didn't seem that important.

        3) I could have introduced a new "faq" page, however I decided against that because I didn't want to duplicate information that rightly should be in the wiki (faq).

        4) I had limited time, I'm a hadoop "newbie", I wanted to address the issue at hand. Philo's comment "perfect enemy of good".

        I'd be happy if someone else were to create another jira that addresses this at a higher level (adds a faq, external links, whatever) and as a results wipes out my change, but short of that happening I'll re-submit this as a short term fix. If it doesn't meet your approval at least I can say I tried. Thanks!

        Show
        Patrick Hunt added a comment - Thanks for the feedback guys. I agree what you are suggesting is better than what I came up with. However I did think about such solutions and ended up deciding not to make such changes for this specific jira, a summary why: 1) Originally I nearly implemented what Suresh suggests, which is a great idea, but decided against it because of the reason Todd mentioned (offline deployments), the fact that no such faq already existed (I checked both the faq and the troubleshooting guide on the wiki), and finally because afaict none of the existing screens are linking out to such docs. I didn't want to introduce new functionality here, just address the jira. 2) newbie users are the most likely to make this mistake. If you're more experience with hadoop you're also more likely to know what other issues could have caused this, or at least where to look. So having a big long list didn't seem that important. 3) I could have introduced a new "faq" page, however I decided against that because I didn't want to duplicate information that rightly should be in the wiki (faq). 4) I had limited time, I'm a hadoop "newbie", I wanted to address the issue at hand. Philo's comment "perfect enemy of good". I'd be happy if someone else were to create another jira that addresses this at a higher level (adds a faq, external links, whatever) and as a results wipes out my change, but short of that happening I'll re-submit this as a short term fix. If it doesn't meet your approval at least I can say I tried. Thanks!
        Hide
        Todd Lipcon added a comment -

        OK, you make a convincing argument. Here are a few small comments:

        • please remove "@param fsn" since it doesn't add any docs
        • please use the full word "otherwise" rather than "otw" in the javadoc
        • please use the constant in DFSConfigKeys instead of the text dfs.datanode.data.dir
        • not a big deal here but we usually use StringBuilder instead of the older (synchronized) StringBuffer in single-threaded situations
        • the correct html for a line break is <br/> - we may as well fix that while we're editing this code
        Show
        Todd Lipcon added a comment - OK, you make a convincing argument. Here are a few small comments: please remove "@param fsn" since it doesn't add any docs please use the full word "otherwise" rather than "otw" in the javadoc please use the constant in DFSConfigKeys instead of the text dfs.datanode.data.dir not a big deal here but we usually use StringBuilder instead of the older (synchronized) StringBuffer in single-threaded situations the correct html for a line break is <br/> - we may as well fix that while we're editing this code
        Hide
        Patrick Hunt added a comment -

        This updated patch addresses all of the concerns Todd raised. Thanks!

        Show
        Patrick Hunt added a comment - This updated patch addresses all of the concerns Todd raised. Thanks!
        Hide
        philo vivero added a comment -

        Patrick, thanks for the advocacy and persistence. Todd, Suresh, et al, thanks for trying to keep the quality high. And most of all, thanks everyone for compromising on the "best we can do for now" instead of leaving it as was: I think this will save many handfuls of hair from being pulled in the coming year or two!

        Show
        philo vivero added a comment - Patrick, thanks for the advocacy and persistence. Todd, Suresh, et al, thanks for trying to keep the quality high. And most of all, thanks everyone for compromising on the "best we can do for now" instead of leaving it as was: I think this will save many handfuls of hair from being pulled in the coming year or two!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12480688/HDFS-1954.patch
        against trunk revision 1128393.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/652//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/652//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/652//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12480688/HDFS-1954.patch against trunk revision 1128393. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/652//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/652//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/652//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        +1. Committed to 22 and trunk. Thanks, Patrick!

        Show
        Todd Lipcon added a comment - +1. Committed to 22 and trunk. Thanks, Patrick!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #694 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/694/)
        HDFS-1954. Improve corrupt files warning message on NameNode web UI. Contributed by Patrick Hunt.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128542
        Files :

        • /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java
        • /hadoop/hdfs/trunk/src/webapps/hdfs/dfshealth.jsp
        • /hadoop/hdfs/trunk/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #694 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/694/ ) HDFS-1954 . Improve corrupt files warning message on NameNode web UI. Contributed by Patrick Hunt. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128542 Files : /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java /hadoop/hdfs/trunk/src/webapps/hdfs/dfshealth.jsp /hadoop/hdfs/trunk/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #680 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/680/)
        HDFS-1954. Improve corrupt files warning message on NameNode web UI. Contributed by Patrick Hunt.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128542
        Files :

        • /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java
        • /hadoop/hdfs/trunk/src/webapps/hdfs/dfshealth.jsp
        • /hadoop/hdfs/trunk/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #680 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/680/ ) HDFS-1954 . Improve corrupt files warning message on NameNode web UI. Contributed by Patrick Hunt. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128542 Files : /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java /hadoop/hdfs/trunk/src/webapps/hdfs/dfshealth.jsp /hadoop/hdfs/trunk/CHANGES.txt
        Hide
        Suresh Srinivas added a comment -

        My preference would be not to have this change. What are we basing "this is the most common cause of the error" other than the opinion expressed by Philo?

        From what I have seen in our clusters, the main reason this happens is when a bunch of datanodes do not come up or when replication factor 1 is used on files and some datanodes do not come up.

        Show
        Suresh Srinivas added a comment - My preference would be not to have this change. What are we basing "this is the most common cause of the error" other than the opinion expressed by Philo? From what I have seen in our clusters, the main reason this happens is when a bunch of datanodes do not come up or when replication factor 1 is used on files and some datanodes do not come up.
        Hide
        Todd Lipcon added a comment -

        Hey Suresh. I agree that this is not the most common case for large existing clusters. But, people running large existing clusters already know the above, and shouldn't be confused by the message. The thinking is that "hint" type messages ought to be directed towards new users, since they're the ones who don't have the operational experience to know better.

        Do you have an alternative patch that would satisfy both the new users and the big cluster operators?

        Show
        Todd Lipcon added a comment - Hey Suresh. I agree that this is not the most common case for large existing clusters. But, people running large existing clusters already know the above, and shouldn't be confused by the message. The thinking is that "hint" type messages ought to be directed towards new users, since they're the ones who don't have the operational experience to know better. Do you have an alternative patch that would satisfy both the new users and the big cluster operators?
        Hide
        Suresh Srinivas added a comment -

        > Do you have an alternative patch that would satisfy both the new users and the big cluster operators?
        Yes, and that is FAQ and not namenode web UI.

        The problem with this is:

        • The hint that is currently added is woefully insufficient, most likely not the problem that results in missing blocks.
        • Others will continue to add information like this on to namenode web UI, making it a mess.

        Also adding exposing internal details such as "the default is typically /tmp which is not persistent" means, if you change this behavior in the code, then this information will be wrong. Instead, addressing Philo's other bug where /tmp by default is not used as storage directory makes sense!

        Show
        Suresh Srinivas added a comment - > Do you have an alternative patch that would satisfy both the new users and the big cluster operators? Yes, and that is FAQ and not namenode web UI. The problem with this is: The hint that is currently added is woefully insufficient, most likely not the problem that results in missing blocks. Others will continue to add information like this on to namenode web UI, making it a mess. Also adding exposing internal details such as "the default is typically /tmp which is not persistent" means, if you change this behavior in the code, then this information will be wrong. Instead, addressing Philo's other bug where /tmp by default is not used as storage directory makes sense!
        Hide
        Patrick Hunt added a comment -

        Hey Suresh, if everyone is ok with adding external links I'll be happy to create a second patch. I was unable to find a FAQ entry for this, if you provide a link to one I'll give it a shot. (also if you have any preferences for the page/link text lmk and I'll incorporate it)

        Show
        Patrick Hunt added a comment - Hey Suresh, if everyone is ok with adding external links I'll be happy to create a second patch. I was unable to find a FAQ entry for this, if you provide a link to one I'll give it a shot. (also if you have any preferences for the page/link text lmk and I'll incorporate it)
        Hide
        Suresh Srinivas added a comment -

        Patrick, currently FAQ does not exist. We may need to create one. Also why do we need to link namenode web UI to FAQ?

        Show
        Suresh Srinivas added a comment - Patrick, currently FAQ does not exist. We may need to create one. Also why do we need to link namenode web UI to FAQ?
        Hide
        Patrick Hunt added a comment -

        Hi Suresh, the jira creator's original request was that when this problem happened, the error message shown to them was more actionable. Esp for newbies given ppl using hadoop for a while probably know what to do, or at least know where to look.

        My reason for linking to the FAQ would be that we can not only tell them there's a problem, but provide a way for them to quickly/easily/directed/etc... get to possible problems/solutions.

        Show
        Patrick Hunt added a comment - Hi Suresh, the jira creator's original request was that when this problem happened, the error message shown to them was more actionable. Esp for newbies given ppl using hadoop for a while probably know what to do, or at least know where to look. My reason for linking to the FAQ would be that we can not only tell them there's a problem, but provide a way for them to quickly/easily/directed/etc... get to possible problems/solutions.
        Hide
        Suresh Srinivas added a comment -

        Patrick, just because jira creator requested this, it may or may not be the correct solution. Certainly we could debate and see what the correct solution is. I have expressed my opinion earlier.

        This issue however does point to poor documentation and need for trouble shooting guide.

        Show
        Suresh Srinivas added a comment - Patrick, just because jira creator requested this, it may or may not be the correct solution. Certainly we could debate and see what the correct solution is. I have expressed my opinion earlier. This issue however does point to poor documentation and need for trouble shooting guide.
        Hide
        Konstantin Shvachko added a comment -

        I agree with Suresh on this. It should be not the message on the web ui, which most ops will find annoying. It should just be addressed in FAQ, so that newbies could learn from there about mis-configuration.
        I would change the message though that way:

         - Please check the log or run fsck.
         + Please check the logs or run fsck in order to identify the missing blocks.
        
        Show
        Konstantin Shvachko added a comment - I agree with Suresh on this. It should be not the message on the web ui, which most ops will find annoying. It should just be addressed in FAQ, so that newbies could learn from there about mis-configuration. I would change the message though that way: - Please check the log or run fsck. + Please check the logs or run fsck in order to identify the missing blocks.
        Hide
        Patrick Hunt added a comment -

        @suresh I agree that it would be good to find a better solution. That's why I responded to your comments. My feeling (granted I could be wrong) was that the user had a point - that it would be nice to give some insight into what might be wrong and where to look for more detail. If a FAQ existed I would likely have addressed this differently.

        @konstantin (hi!) - this only shows up when a problem occurs, not all the time. So it would only be annoying when you have a real problem. Giving more insight at that time seems like it would be helpful/useful.

        As I said previously, if you know the "right way" feel free to overwrite my changes to make it better. I was only trying to make it so.

        Show
        Patrick Hunt added a comment - @suresh I agree that it would be good to find a better solution. That's why I responded to your comments. My feeling (granted I could be wrong) was that the user had a point - that it would be nice to give some insight into what might be wrong and where to look for more detail. If a FAQ existed I would likely have addressed this differently. @konstantin (hi!) - this only shows up when a problem occurs, not all the time. So it would only be annoying when you have a real problem. Giving more insight at that time seems like it would be helpful/useful. As I said previously, if you know the "right way" feel free to overwrite my changes to make it better. I was only trying to make it so.
        Hide
        Patrick Hunt added a comment -

        Sorry, make that "I am only trying to make it better.", as I said I'd be happy to do more work on this if we identify the "right way".

        Show
        Patrick Hunt added a comment - Sorry, make that "I am only trying to make it better.", as I said I'd be happy to do more work on this if we identify the "right way".
        Hide
        Patrick Hunt added a comment -

        Hey, to make my suggestion more concrete I put together a patch that show's what I'm getting at:

        1) limited annoyance - a link to some help (ie the faq entry, which doesn't currently exist but say it did). This gives new users some concrete help, advanced users can easily ignore it, and we're not duplicating any detail that rightly belongs on the faq page.
        2) note I also included Konstantin's suggestion.

        --- src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java
        +++ src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java
        @@ -155,13 +155,10 @@ class NamenodeJspHelper {
               // Warning class is typically displayed in RED
               result.append("<br/><a class=\"warning\" href=\"/corrupt_files.jsp\" title=\"List corrupt files\">\n");
               result.append("<b>WARNING : There are " + missingBlocks
        -          + " missing blocks. Please check the log or run fsck.</b>");
        +          + " missing blocks. Please check the logs or run fsck in order to identify the missing blocks.</b>");
               result.append("</a>");
         
        -      result.append("<br/><div class=\"small\">Hint: A common mis-configuration is not ");
        -      result.append("overriding \"" + DFSConfigKeys.DFS_DATANODE_DATA_DIR_KEY
        -          + "\" on all datanodes");
        -      result.append("(the default is typically /tmp which is not persistent)</div>");
        +      result.append("<br/><div class=\"small\">See this <a href=\"http:hadoop.apache.org/some_FAQ_page.html#someQandAEntry\">FAQ</a> entry for common causes and potential solutions.");
               result.append("<br/><br/>\n");
         
               return result.toString();
        

        LMK if that makes more sense, of if I'm way off.

        Show
        Patrick Hunt added a comment - Hey, to make my suggestion more concrete I put together a patch that show's what I'm getting at: 1) limited annoyance - a link to some help (ie the faq entry, which doesn't currently exist but say it did). This gives new users some concrete help, advanced users can easily ignore it, and we're not duplicating any detail that rightly belongs on the faq page. 2) note I also included Konstantin's suggestion. --- src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java +++ src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java @@ -155,13 +155,10 @@ class NamenodeJspHelper { // Warning class is typically displayed in RED result.append("<br/><a class=\"warning\" href=\"/corrupt_files.jsp\" title=\"List corrupt files\">\n"); result.append("<b>WARNING : There are " + missingBlocks - + " missing blocks. Please check the log or run fsck.</b>"); + + " missing blocks. Please check the logs or run fsck in order to identify the missing blocks.</b>"); result.append("</a>"); - result.append("<br/><div class=\"small\">Hint: A common mis-configuration is not "); - result.append("overriding \"" + DFSConfigKeys.DFS_DATANODE_DATA_DIR_KEY - + "\" on all datanodes"); - result.append("(the default is typically /tmp which is not persistent)</div>"); + result.append("<br/><div class=\"small\">See this <a href=\"http:hadoop.apache.org/some_FAQ_page.html#someQandAEntry\">FAQ</a> entry for common causes and potential solutions."); result.append("<br/><br/>\n"); return result.toString(); LMK if that makes more sense, of if I'm way off.
        Hide
        Konstantin Shvachko added a comment -

        On my cluster the site "hadoop.apache.org" may not be accessible. So this is going to be reported as a broken link.

        > This gives new users some concrete help, advanced users can easily ignore it.
        So it will help a new user once and then he will annoyed for the rest of his life.

        The hint you provided in the patch describes only one of many problems that can result in missing blocks in specific system setup.

        I think clarifying what a user should do to identify missing blocks is a good starting point, which will lead him to replicas, and particularly to dfs.datanode.data.dir.
        Can we just stick to this change and not provide any other links or hints?
        This was intended as a simple fix.
        I can contribute a FAQ.

        Show
        Konstantin Shvachko added a comment - On my cluster the site "hadoop.apache.org" may not be accessible. So this is going to be reported as a broken link. > This gives new users some concrete help, advanced users can easily ignore it. So it will help a new user once and then he will annoyed for the rest of his life. The hint you provided in the patch describes only one of many problems that can result in missing blocks in specific system setup. I think clarifying what a user should do to identify missing blocks is a good starting point, which will lead him to replicas, and particularly to dfs.datanode.data.dir . Can we just stick to this change and not provide any other links or hints? This was intended as a simple fix. I can contribute a FAQ.
        Hide
        Patrick Hunt added a comment -

        Can we just stick to this change and not provide any other links or hints?
        I can contribute a FAQ.

        I think a FAQ entry would be a great help to users. As I mentioned if that existed this change would probably have gone a lot smoother.

        Makes sense to drop the hint given we'd have a proper writeup. I would like to just reference the existence of a FAQ instead, no link but just something like "<small>See the Hadoop FAQ for common causes and potential solutions</small>" or somesuch. This would only be displayed in the error case, not all the time. That sound reasonable?

        Show
        Patrick Hunt added a comment - Can we just stick to this change and not provide any other links or hints? I can contribute a FAQ. I think a FAQ entry would be a great help to users. As I mentioned if that existed this change would probably have gone a lot smoother. Makes sense to drop the hint given we'd have a proper writeup. I would like to just reference the existence of a FAQ instead, no link but just something like "<small>See the Hadoop FAQ for common causes and potential solutions</small>" or somesuch. This would only be displayed in the error case, not all the time. That sound reasonable?
        Hide
        Konstantin Shvachko added a comment -

        Yes, that sounds good.

        Show
        Konstantin Shvachko added a comment - Yes, that sounds good.
        Hide
        Suresh Srinivas added a comment -

        Should we be reverting this change until new reworked solution? BTW could this change be causing HDFS-2013?

        Show
        Suresh Srinivas added a comment - Should we be reverting this change until new reworked solution? BTW could this change be causing HDFS-2013 ?
        Hide
        Todd Lipcon added a comment -

        I guess you're right, we should revert this for now while we discuss. I will do so momentarily.

        Show
        Todd Lipcon added a comment - I guess you're right, we should revert this for now while we discuss. I will do so momentarily.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Revised the title.

        Show
        Tsz Wo Nicholas Sze added a comment - Revised the title.
        Hide
        Patrick Hunt added a comment -

        Yea that's on me, sorry. It's weird – the patch test indicated success. I'll update the patch, incl the test, once Konstantin updates the FAQ. Thanks Suresh/Todd.

        Show
        Patrick Hunt added a comment - Yea that's on me, sorry. It's weird – the patch test indicated success. I'll update the patch, incl the test, once Konstantin updates the FAQ. Thanks Suresh/Todd.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #705 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/705/)
        Revert HDFS-1954 since there is some discussion on the JIRA.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1130693
        Files :

        • /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java
        • /hadoop/hdfs/trunk/src/webapps/hdfs/dfshealth.jsp
        • /hadoop/hdfs/trunk/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #705 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/705/ ) Revert HDFS-1954 since there is some discussion on the JIRA. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1130693 Files : /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java /hadoop/hdfs/trunk/src/webapps/hdfs/dfshealth.jsp /hadoop/hdfs/trunk/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #686 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/686/)
        Revert HDFS-1954 since there is some discussion on the JIRA.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1130693
        Files :

        • /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java
        • /hadoop/hdfs/trunk/src/webapps/hdfs/dfshealth.jsp
        • /hadoop/hdfs/trunk/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #686 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/686/ ) Revert HDFS-1954 since there is some discussion on the JIRA. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1130693 Files : /hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeJspHelper.java /hadoop/hdfs/trunk/src/webapps/hdfs/dfshealth.jsp /hadoop/hdfs/trunk/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-22-branch #61 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/61/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #61 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/61/ )
        Hide
        Konstantin Shvachko added a comment -

        FAQ is here: HDFS 3.15.

        Show
        Konstantin Shvachko added a comment - FAQ is here: HDFS 3.15 .
        Hide
        Patrick Hunt added a comment -

        This updated patchremoves the hint, other updates to the patch as we discussed in the comment stream. (Thanks for the FAQ update Konstantin!)

        Note that this can only be applied to trunk. The reason why the 22 build failed (HDFS-2013) is that the test for this issue "TestMissingBlocksAlert" is different in 22 vs trunk. In 22 it checks for "There are about #" while trunk checks for "There are #" missing blocks. This test passes for me on trunk with this updated patch. I also verified by running the test by hand through the UI interface.

        Show
        Patrick Hunt added a comment - This updated patchremoves the hint, other updates to the patch as we discussed in the comment stream. (Thanks for the FAQ update Konstantin!) Note that this can only be applied to trunk. The reason why the 22 build failed ( HDFS-2013 ) is that the test for this issue "TestMissingBlocksAlert" is different in 22 vs trunk. In 22 it checks for "There are about #" while trunk checks for "There are #" missing blocks. This test passes for me on trunk with this updated patch. I also verified by running the test by hand through the UI interface.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12481740/HDFS-1954.patch
        against trunk revision 1133114.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestHDFSCLI

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/735//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/735//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/735//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481740/HDFS-1954.patch against trunk revision 1133114. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/735//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/735//console This message is automatically generated.
        Hide
        Patrick Hunt added a comment -

        TestHDFSCLI is currently broken in trunk, not caused by this patch:
        https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Hdfs-trunk/690/testReport/junit/org.apache.hadoop.cli/TestHDFSCLI/
        also TestMissingBlocksAlert is covering this change.

        Show
        Patrick Hunt added a comment - TestHDFSCLI is currently broken in trunk, not caused by this patch: https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Hdfs-trunk/690/testReport/junit/org.apache.hadoop.cli/TestHDFSCLI/ also TestMissingBlocksAlert is covering this change.
        Hide
        Konstantin Shvachko added a comment -

        +1

        Show
        Konstantin Shvachko added a comment - +1
        Hide
        Konstantin Shvachko added a comment -

        Updated Patrick's patch for 0.22 branch.

        Show
        Konstantin Shvachko added a comment - Updated Patrick's patch for 0.22 branch.
        Hide
        Patrick Hunt added a comment -

        Thanks Konstantin! Also, really appreciate the f/b you all provided and your patience. I originally picked this up thinking it would be a simple slam dunk change for me to cut my teeth on. Regards.

        Show
        Patrick Hunt added a comment - Thanks Konstantin! Also, really appreciate the f/b you all provided and your patience. I originally picked this up thinking it would be a simple slam dunk change for me to cut my teeth on. Regards.
        Hide
        Konstantin Shvachko added a comment -

        I just committed this. Thank you Patrick.

        Show
        Konstantin Shvachko added a comment - I just committed this. Thank you Patrick.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-22-branch #66 (See https://builds.apache.org/job/Hadoop-Hdfs-22-branch/66/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #66 (See https://builds.apache.org/job/Hadoop-Hdfs-22-branch/66/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #746 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #746 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #699 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/699/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #699 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/699/ )

          People

          • Assignee:
            Patrick Hunt
            Reporter:
            philo vivero
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development