Hadoop Common
  1. Hadoop Common
  2. HADOOP-2947

[HOD] Hod should redirect stderr and stdout of Hadoop daemons to assist debugging

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.17.0
    • Component/s: contrib/hod
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      The stdout and stderr streams of daemons are redirected to files that are created under the hadoop log directory. Users can now send kill 3 signals to the daemons to get stack traces and thread dumps for debugging.

      Description

      Copied from internal bug details from Koji:

      ==========================
      Sometimes JobTracker/TaskTracker starts consuming 99% cpu and stops responding to 'jstack' call. In those cases,
      usually it still responds to kill -QUIT signal which forces the jvm to dump the stack to stdout.

      Please have the stdout of JT/TT redirected to a file.

      Adding stderr.
      If thread has an uncaught exception, it prints out to stderr and dies.
      ==========================

      1. HADOOP-2947
        12 kB
        Vinod Kumar Vavilapalli
      2. HADOOP-2947.2
        6 kB
        Vinod Kumar Vavilapalli
      3. HADOOP-2947.3
        7 kB
        Vinod Kumar Vavilapalli
      4. HADOOP-2947.4
        7 kB
        Vinod Kumar Vavilapalli

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching a patch. stdout and stderr are now saved along with hadoop.log in the hadoop log directory. Thus, stdout is redirected to $HADOOP_LOG_DIR/<daemon-name>.out and stderr to $HADOOP_LOG_DIR/<daemon-name>.err.

        Made changes to simpleCommand to enable starting commands with stderr and/or stdout set. And made related changes in HadoopCommand.

        Added testcases. Also tested by giving a sigquit to jobtracker and verifying that the standard output indeed goes to the jobtracker.out file. Didn't get to check stderr.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching a patch. stdout and stderr are now saved along with hadoop.log in the hadoop log directory. Thus, stdout is redirected to $HADOOP_LOG_DIR/<daemon-name>.out and stderr to $HADOOP_LOG_DIR/<daemon-name>.err. Made changes to simpleCommand to enable starting commands with stderr and/or stdout set. And made related changes in HadoopCommand. Added testcases. Also tested by giving a sigquit to jobtracker and verifying that the standard output indeed goes to the jobtracker.out file. Didn't get to check stderr.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Running it through Hudson.

        Show
        Vinod Kumar Vavilapalli added a comment - Running it through Hudson.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378113/HADOOP-2947
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 4 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378113/HADOOP-2947 against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 4 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        After looking at the changes, we realize there are some issues with the patch. Also, this is requiring changes to a core functionality of HOD. So, moving this out of the 0.17 list.

        Show
        Hemanth Yamijala added a comment - After looking at the changes, we realize there are some issues with the patch. Also, this is requiring changes to a core functionality of HOD. So, moving this out of the 0.17 list.
        Hide
        Hemanth Yamijala added a comment -

        Marking this as a blocker, after discussions with Mukund and Sameer. We are close to the fix, and can be done in a couple of days, and it seems a useful thing to have.

        Show
        Hemanth Yamijala added a comment - Marking this as a blocker, after discussions with Mukund and Sameer. We are close to the fix, and can be done in a couple of days, and it seems a useful thing to have.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Earlier patch kind of does a radical surgery on simpleCommand, though, as Hemanth suggested, this could be done without disturbing much of the current code. Attaching a new patch. This uses the current simpleCommand framework to redirect the stderr and stdout to files. Redirecting these on all daemons to <daemon-name.out> and <daemon-name.err> in the corresponding log directories, which when log-destination-uri is specified, will be archived along with hadoop logs onto dfs.

        Tested stdout by doing a sigquit, tested stderr by forcing namenode to fail in intializing(for e.g removed logging jar from hadoop lib dir). Tested to make sure that streams are getting redirected to respective files as desired.

        Added test cases also.

        Show
        Vinod Kumar Vavilapalli added a comment - Earlier patch kind of does a radical surgery on simpleCommand, though, as Hemanth suggested, this could be done without disturbing much of the current code. Attaching a new patch. This uses the current simpleCommand framework to redirect the stderr and stdout to files. Redirecting these on all daemons to <daemon-name.out> and <daemon-name.err> in the corresponding log directories, which when log-destination-uri is specified, will be archived along with hadoop logs onto dfs. Tested stdout by doing a sigquit, tested stderr by forcing namenode to fail in intializing(for e.g removed logging jar from hadoop lib dir). Tested to make sure that streams are getting redirected to respective files as desired. Added test cases also.
        Hide
        Hemanth Yamijala added a comment -

        Functionality works well with the patch. Also code looks good.

        One minor point is the test cases. The current tests are using ls to check output. The output of these programs could be environment specific, and the tests could fail because of that.

        For e.g. I verified on FreeBSD that when ls fails, what comes out on stderr is something like
        ls: <filename>: No such file or directory,

        whereas on Linux it says:
        /bin/ls: <filename>: No such file or directory.

        So, I think we should have tests that aren't dependent on platforms like this.

        Other than this, +1

        Show
        Hemanth Yamijala added a comment - Functionality works well with the patch. Also code looks good. One minor point is the test cases. The current tests are using ls to check output. The output of these programs could be environment specific, and the tests could fail because of that. For e.g. I verified on FreeBSD that when ls fails, what comes out on stderr is something like ls: <filename>: No such file or directory, whereas on Linux it says: /bin/ls: <filename>: No such file or directory. So, I think we should have tests that aren't dependent on platforms like this. Other than this, +1
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Added a helper program that prints sample text to stderr, and changed the test-case testRedirectedStderr to call this helper program instead.

        Show
        Vinod Kumar Vavilapalli added a comment - Added a helper program that prints sample text to stderr, and changed the test-case testRedirectedStderr to call this helper program instead.
        Hide
        Hemanth Yamijala added a comment -
        • The patch assumes current working directory while referring to the helper program. This will not work
        • Can we also change the stdout test to use the helper program. I like having the expected output of the test under our complete control. For e.g. say ls is aliased to something in some environment, this test would break again.
        Show
        Hemanth Yamijala added a comment - The patch assumes current working directory while referring to the helper program. This will not work Can we also change the stdout test to use the helper program. I like having the expected output of the test under our complete control. For e.g. say ls is aliased to something in some environment, this test would break again.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Incorporating the suggested changes.

        Show
        Vinod Kumar Vavilapalli added a comment - Incorporating the suggested changes.
        Hide
        Hemanth Yamijala added a comment -

        +1 for the changes.

        Show
        Hemanth Yamijala added a comment - +1 for the changes.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379020/HADOOP-2947.4
        against trunk revision 643282.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 7 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379020/HADOOP-2947.4 against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 7 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Vinod !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Vinod !

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development