Hadoop Common
  1. Hadoop Common
  2. HADOOP-3452

fsck exit code would be better if non-zero when FS corrupt

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Changed exit status of fsck to report whether the files system is healthy or corrupt.

      Description

      fsck exit code would be better if behaved like other sys admin tools and exit code indicated healthy/corrupt or other errors.

      1. HADOOP-3452-1.patch
        6 kB
        Lohit Vijayarenu
      2. HADOOP-3452-2.patch
        7 kB
        Lohit Vijayarenu
      3. HADOOP-3452-3.patch
        7 kB
        Lohit Vijayarenu

        Activity

        Hide
        Chris Douglas added a comment -

        I just committed this. Thanks, Lohit

        Show
        Chris Douglas added a comment - I just committed this. Thanks, Lohit
        Hide
        Robert Chansler added a comment -

        Assigning to whomever submitted a patch so as to better manage committing things that are ready for prime time.

        Show
        Robert Chansler added a comment - Assigning to whomever submitted a patch so as to better manage committing things that are ready for prime time.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12383356/HADOOP-3452-3.patch
        against trunk revision 663079.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2570/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2570/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2570/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2570/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383356/HADOOP-3452-3.patch against trunk revision 663079. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2570/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2570/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2570/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2570/console This message is automatically generated.
        Hide
        dhruba borthakur added a comment -

        +1. Code looks good.

        Show
        dhruba borthakur added a comment - +1. Code looks good.
        Hide
        Lohit Vijayarenu added a comment -

        Thanks for reviewing this Dhruba. The reason I scan for "is CORRUPT" is because, fsck now also reports CORRUPT blocks. And also, if a filename contains CORRUPT, we do not want to report filesystem to be corrupt. As I mentioned, this is a temporary fix. I have updated the patch comments as you suggested.

        Show
        Lohit Vijayarenu added a comment - Thanks for reviewing this Dhruba. The reason I scan for "is CORRUPT" is because, fsck now also reports CORRUPT blocks. And also, if a filename contains CORRUPT, we do not want to report filesystem to be corrupt. As I mentioned, this is a temporary fix. I have updated the patch comments as you suggested.
        Hide
        dhruba borthakur added a comment -

        +1 code looks good. Is there a reason why we match the string "is CORRUPT" and not just "CORRUPT"? If we are doing this on purpose, then the comment in the DFSck.java should mention that "is CORRUPT" is the precise string that is being used.

        Show
        dhruba borthakur added a comment - +1 code looks good. Is there a reason why we match the string "is CORRUPT" and not just "CORRUPT"? If we are doing this on purpose, then the comment in the DFSck.java should mention that "is CORRUPT" is the precise string that is being used.
        Hide
        Lohit Vijayarenu added a comment -

        looks like hudson had problem, the failure is not related to the patch. Re-submitting.

        Show
        Lohit Vijayarenu added a comment - looks like hudson had problem, the failure is not related to the patch. Re-submitting.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12383268/HADOOP-3452-2.patch
        against trunk revision 662913.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2552/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383268/HADOOP-3452-2.patch against trunk revision 662913. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2552/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12383268/HADOOP-3452-2.patch
        against trunk revision 662634.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2544/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2544/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2544/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2544/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383268/HADOOP-3452-2.patch against trunk revision 662634. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2544/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2544/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2544/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2544/console This message is automatically generated.
        Hide
        Lohit Vijayarenu added a comment -

        I am attaching a patch with comments about this for now. Have another small line change for flushing output. This is a temporary solution, until we plan a different approach to do this.

        Show
        Lohit Vijayarenu added a comment - I am attaching a patch with comments about this for now. Have another small line change for flushing output. This is a temporary solution, until we plan a different approach to do this.
        Hide
        Lohit Vijayarenu added a comment -

        Thanks Steve, Dhruba for looking into this. If we consider the current implmentation, fsck was implemented as servlet so that we do not use RPC. The client opens up URL connection and reads the output from FSCK servlet. We might not be able to set new headers/response code at the end of fsck executiong since the fsck in itself take a while to complete. And the status is not know prior to executing fsck. Could you brief me as to how we could get the response other than scanning this stream?

        Show
        Lohit Vijayarenu added a comment - Thanks Steve, Dhruba for looking into this. If we consider the current implmentation, fsck was implemented as servlet so that we do not use RPC. The client opens up URL connection and reads the output from FSCK servlet. We might not be able to set new headers/response code at the end of fsck executiong since the fsck in itself take a while to complete. And the status is not know prior to executing fsck. Could you brief me as to how we could get the response other than scanning this stream?
        Hide
        steve_l added a comment -

        -Error codes could be something added as an option. They are reliable, except that proxy servers like to interfere.

        -moving to constants should be easy.

        It would be extra impressive if the result was hidden in a <span> with a specific name, so you could XPATH/regexp on the response and look for <span class="status" >SUCCESS</span> through some Xpath like

        /span[@class="status"]

        >Do you have another mechanism in mind (other than using a servlet) to run fsck?

        Possibly. In SVN we have the code to bring up any of the name/data/tracker/jobtracker nodes and execute operations on the filesystem, though many more tests are needed. adding an FSCK check component would round things off. Right now, I'd consider generating an httpclient request with all the parameters and do a regexp search on the result.

        Show
        steve_l added a comment - -Error codes could be something added as an option. They are reliable, except that proxy servers like to interfere. -moving to constants should be easy. It would be extra impressive if the result was hidden in a <span> with a specific name, so you could XPATH/regexp on the response and look for <span class="status" >SUCCESS</span> through some Xpath like /span [@class="status"] >Do you have another mechanism in mind (other than using a servlet) to run fsck? Possibly. In SVN we have the code to bring up any of the name/data/tracker/jobtracker nodes and execute operations on the filesystem, though many more tests are needed. adding an FSCK check component would round things off. Right now, I'd consider generating an httpclient request with all the parameters and do a regexp search on the result.
        Hide
        dhruba borthakur added a comment -

        I like the idea of using as compile time constant NameNodeFsck.STATUS_HEALTHY variable to determine that return status of fsck. Also, it would be nice to not depend on Http error error codes.

        Do you have another mechanism in mind (other than using a servlet) to run fsck?

        Show
        dhruba borthakur added a comment - I like the idea of using as compile time constant NameNodeFsck.STATUS_HEALTHY variable to determine that return status of fsck. Also, it would be nice to not depend on Http error error codes. Do you have another mechanism in mind (other than using a servlet) to run fsck?
        Hide
        steve_l added a comment -

        Scanning for strings is very, very brittle, and not a good way to do system management. The fact that it is the only way to verify that rpm and yum are working are design errors on the part of those tools, not an example of good programming to follow.

        • At the very least NameNodeFsck should offer public constants for the status strings, rather than this:

        replace "Status: " + (isHealthy() ? "HEALTHY" : "CORRUPT"))

        The client could then look for NameNodeFsck.STATUS_HEALTHY, a string that would (currently) be "Status: HEALTHY" when all was well.

        -it would be better for the servlet response to include an HTTP error code (500?) on corruption, though that leads to two problems
        -you need to buffer the output if you want to change the error code
        -java.net is atrocious at accessing the contents of error pages

        Looking at NameNodeFsck, I think it would be good to remove all knowledge of the fact it is running under a servlet from its code -that is "view", not "model. This would give us the option of using alternate ways to invoke filesystem checks than http requests.

        Show
        steve_l added a comment - Scanning for strings is very, very brittle, and not a good way to do system management. The fact that it is the only way to verify that rpm and yum are working are design errors on the part of those tools, not an example of good programming to follow. At the very least NameNodeFsck should offer public constants for the status strings, rather than this: replace "Status: " + (isHealthy() ? "HEALTHY" : "CORRUPT")) The client could then look for NameNodeFsck.STATUS_HEALTHY, a string that would (currently) be "Status: HEALTHY" when all was well. -it would be better for the servlet response to include an HTTP error code (500?) on corruption, though that leads to two problems -you need to buffer the output if you want to change the error code -java.net is atrocious at accessing the contents of error pages Looking at NameNodeFsck, I think it would be good to remove all knowledge of the fact it is running under a servlet from its code -that is "view", not "model. This would give us the option of using alternate ways to invoke filesystem checks than http requests.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12382904/HADOOP-3452-1.patch
        against trunk revision 659405.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2518/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2518/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2518/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2518/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382904/HADOOP-3452-1.patch against trunk revision 659405. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2518/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2518/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2518/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2518/console This message is automatically generated.
        Hide
        Lohit Vijayarenu added a comment -

        Attached patch scans for the string "is CORRUPT". Have updated the test cases, so that tomorrow in someone changes the output, tests reflect them.

        Show
        Lohit Vijayarenu added a comment - Attached patch scans for the string "is CORRUPT". Have updated the test cases, so that tomorrow in someone changes the output, tests reflect them.
        Hide
        Pete Wyckoff added a comment -

        Yes, that's my current work around, but was hoping to have fsck itself include that.

        Show
        Pete Wyckoff added a comment - Yes, that's my current work around, but was hoping to have fsck itself include that.
        Hide
        dhruba borthakur added a comment -

        One simple approach is for the client to look for the string "CORRUPT" and then exit with either 0 or 1.

        Show
        dhruba borthakur added a comment - One simple approach is for the client to look for the string "CORRUPT" and then exit with either 0 or 1.

          People

          • Assignee:
            Lohit Vijayarenu
            Reporter:
            Pete Wyckoff
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development