Hadoop Common
  1. Hadoop Common
  2. HADOOP-2796

For script option hod should exit with distinguishable exit codes for script code and hod exit code.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.17.0
    • Component/s: contrib/hod
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      A provision to reliably detect a failing script's exit code was added. In case the hod script option returned a non-zero exit code, users can now look for a 'script.exitcode' file written to the HOD cluster directory. If this file is present, it means the script failed with the returned exit code.
      Show
      A provision to reliably detect a failing script's exit code was added. In case the hod script option returned a non-zero exit code, users can now look for a 'script.exitcode' file written to the HOD cluster directory. If this file is present, it means the script failed with the returned exit code.

      Description

      For hod script option, the exit code should distinguishable between hod exit code and script exit code.
      e.g.
      If script command contains the streaming command at end and that fails due to input path not found, its value exit cod will 5 which overlaps with hod exit code 5 which means "job execution failure"
      It would hod throws some distinguishable exit codes
      e.g
      For above examples 64 +5 =69 and we should this to get exact exit code of hod script command user should subtract 64 from exit code

      1. 2796.1.patch
        6 kB
        Hemanth Yamijala
      2. 2796.patch
        4 kB
        Hemanth Yamijala

        Activity

        Hide
        Hemanth Yamijala added a comment -

        The proposed solution in the bug of adding a constant number to the script's exit code, in retrospect, seems like a bad idea.

        • It is not very intuitive.
        • There could be cases where because of the addition, some shells like bash which do modulo 256 on exit codes, could make the result become 0, which seems like a successful execution.
        • It causes an unreasonable dependency between HOD and user scripts, who need to remember this magic number.

        The requirements for this problem, to my understanding, are as follows:

        • Return a zero exit code for a completely successful operation (both hod and the script have worked fine)
        • Return a non-zero exit code for a failed operation (either hod or the script have failed). Users may not care for more than this. Did it work or not
        • In the event of a non-zero exit code where the user wants to know if his script failed, provide an easy, clear way to determine if it failed.

        On these lines, the attached patch does the following:

        • Returns a zero exit code on success.
        • Returns a non-zero exit code on failure of script or hod itself.
        • If the script returned a non-zero exit code, it writes the exit code from the script to a file 'script.exitcode' into the cluster directory. Users can simple check for this file's existence and determine if it is a script failure.
        • If it's a hod failure, no such file will exist.
        Show
        Hemanth Yamijala added a comment - The proposed solution in the bug of adding a constant number to the script's exit code, in retrospect, seems like a bad idea. It is not very intuitive. There could be cases where because of the addition, some shells like bash which do modulo 256 on exit codes, could make the result become 0, which seems like a successful execution. It causes an unreasonable dependency between HOD and user scripts, who need to remember this magic number. The requirements for this problem, to my understanding, are as follows: Return a zero exit code for a completely successful operation (both hod and the script have worked fine) Return a non-zero exit code for a failed operation (either hod or the script have failed). Users may not care for more than this. Did it work or not In the event of a non-zero exit code where the user wants to know if his script failed, provide an easy, clear way to determine if it failed. On these lines, the attached patch does the following: Returns a zero exit code on success. Returns a non-zero exit code on failure of script or hod itself. If the script returned a non-zero exit code, it writes the exit code from the script to a file 'script.exitcode' into the cluster directory. Users can simple check for this file's existence and determine if it is a script failure. If it's a hod failure, no such file will exist.
        Hide
        Hemanth Yamijala added a comment -

        There are no test cases in this patch, because the commit of HADOOP-2848 missed committing the testHod.py file. This will cause a conflict now as the test cases should really be added to that file. Will submit test cases as part of a separate patch.

        Show
        Hemanth Yamijala added a comment - There are no test cases in this patch, because the commit of HADOOP-2848 missed committing the testHod.py file. This will cause a conflict now as the test cases should really be added to that file. Will submit test cases as part of a separate patch.
        Hide
        Vinod Kumar Vavilapalli added a comment - - edited

        +1 for the proposal.

        However, there is one problem with a corner case. If we do the following:

        • first run a script which returns with an error(and so the script.exitcode file exists once "hod script" command finishes),
        • and then run another "hod script" with the same cluster directory, but with invalid options (say --hod.nodecount=abc),
          the script.exitcode file will still be around, and hod returns a non-zero exit code; implying(incorrectly) that the script ran and returned with an error, which in actuality is not the case.

        Other than that, tested the rest of the cases successfully.

        Barring the corner case, +1 for the fix in general.

        Show
        Vinod Kumar Vavilapalli added a comment - - edited +1 for the proposal. However, there is one problem with a corner case. If we do the following: first run a script which returns with an error(and so the script.exitcode file exists once "hod script" command finishes), and then run another "hod script" with the same cluster directory, but with invalid options (say --hod.nodecount=abc), the script.exitcode file will still be around, and hod returns a non-zero exit code; implying(incorrectly) that the script ran and returned with an error, which in actuality is not the case. Other than that, tested the rest of the cases successfully. Barring the corner case, +1 for the fix in general.
        Hide
        Hemanth Yamijala added a comment -

        Modified code to handle Vinod's corner case as well.

        Show
        Hemanth Yamijala added a comment - Modified code to handle Vinod's corner case as well.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378375/2796.1.patch
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2022/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2022/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2022/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2022/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378375/2796.1.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2022/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2022/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2022/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2022/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        The failure for unit tests is expected, as mentioned above.

        Show
        Hemanth Yamijala added a comment - The failure for unit tests is expected, as mentioned above.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Hemanth!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Hemanth!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #436 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/436/ )

          People

          • Assignee:
            Hemanth Yamijala
            Reporter:
            Karam Singh
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development