Hadoop Common
  1. Hadoop Common
  2. HADOOP-2947

[HOD] Hod should redirect stderr and stdout of Hadoop daemons to assist debugging

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.17.0
    • Component/s: contrib/hod
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      The stdout and stderr streams of daemons are redirected to files that are created under the hadoop log directory. Users can now send kill 3 signals to the daemons to get stack traces and thread dumps for debugging.

      Description

      Copied from internal bug details from Koji:

      ==========================
      Sometimes JobTracker/TaskTracker starts consuming 99% cpu and stops responding to 'jstack' call. In those cases,
      usually it still responds to kill -QUIT signal which forces the jvm to dump the stack to stdout.

      Please have the stdout of JT/TT redirected to a file.

      Adding stderr.
      If thread has an uncaught exception, it prints out to stderr and dies.
      ==========================

      1. HADOOP-2947
        12 kB
        Vinod Kumar Vavilapalli
      2. HADOOP-2947.2
        6 kB
        Vinod Kumar Vavilapalli
      3. HADOOP-2947.3
        7 kB
        Vinod Kumar Vavilapalli
      4. HADOOP-2947.4
        7 kB
        Vinod Kumar Vavilapalli

        Activity

        Hemanth Yamijala created issue -
        Hemanth Yamijala made changes -
        Field Original Value New Value
        Assignee Vinod Kumar Vavilapalli [ vinodkv ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching a patch. stdout and stderr are now saved along with hadoop.log in the hadoop log directory. Thus, stdout is redirected to $HADOOP_LOG_DIR/<daemon-name>.out and stderr to $HADOOP_LOG_DIR/<daemon-name>.err.

        Made changes to simpleCommand to enable starting commands with stderr and/or stdout set. And made related changes in HadoopCommand.

        Added testcases. Also tested by giving a sigquit to jobtracker and verifying that the standard output indeed goes to the jobtracker.out file. Didn't get to check stderr.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching a patch. stdout and stderr are now saved along with hadoop.log in the hadoop log directory. Thus, stdout is redirected to $HADOOP_LOG_DIR/<daemon-name>.out and stderr to $HADOOP_LOG_DIR/<daemon-name>.err. Made changes to simpleCommand to enable starting commands with stderr and/or stdout set. And made related changes in HadoopCommand. Added testcases. Also tested by giving a sigquit to jobtracker and verifying that the standard output indeed goes to the jobtracker.out file. Didn't get to check stderr.
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-2947 [ 12378113 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Running it through Hudson.

        Show
        Vinod Kumar Vavilapalli added a comment - Running it through Hudson.
        Vinod Kumar Vavilapalli made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378113/HADOOP-2947
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 4 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378113/HADOOP-2947 against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 4 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1992/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        After looking at the changes, we realize there are some issues with the patch. Also, this is requiring changes to a core functionality of HOD. So, moving this out of the 0.17 list.

        Show
        Hemanth Yamijala added a comment - After looking at the changes, we realize there are some issues with the patch. Also, this is requiring changes to a core functionality of HOD. So, moving this out of the 0.17 list.
        Hemanth Yamijala made changes -
        Fix Version/s 0.17.0 [ 12312913 ]
        Hide
        Hemanth Yamijala added a comment -

        Marking this as a blocker, after discussions with Mukund and Sameer. We are close to the fix, and can be done in a couple of days, and it seems a useful thing to have.

        Show
        Hemanth Yamijala added a comment - Marking this as a blocker, after discussions with Mukund and Sameer. We are close to the fix, and can be done in a couple of days, and it seems a useful thing to have.
        Hemanth Yamijala made changes -
        Fix Version/s 0.17.0 [ 12312913 ]
        Priority Major [ 3 ] Blocker [ 1 ]
        Vinod Kumar Vavilapalli made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Earlier patch kind of does a radical surgery on simpleCommand, though, as Hemanth suggested, this could be done without disturbing much of the current code. Attaching a new patch. This uses the current simpleCommand framework to redirect the stderr and stdout to files. Redirecting these on all daemons to <daemon-name.out> and <daemon-name.err> in the corresponding log directories, which when log-destination-uri is specified, will be archived along with hadoop logs onto dfs.

        Tested stdout by doing a sigquit, tested stderr by forcing namenode to fail in intializing(for e.g removed logging jar from hadoop lib dir). Tested to make sure that streams are getting redirected to respective files as desired.

        Added test cases also.

        Show
        Vinod Kumar Vavilapalli added a comment - Earlier patch kind of does a radical surgery on simpleCommand, though, as Hemanth suggested, this could be done without disturbing much of the current code. Attaching a new patch. This uses the current simpleCommand framework to redirect the stderr and stdout to files. Redirecting these on all daemons to <daemon-name.out> and <daemon-name.err> in the corresponding log directories, which when log-destination-uri is specified, will be archived along with hadoop logs onto dfs. Tested stdout by doing a sigquit, tested stderr by forcing namenode to fail in intializing(for e.g removed logging jar from hadoop lib dir). Tested to make sure that streams are getting redirected to respective files as desired. Added test cases also.
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-2947.2 [ 12378917 ]
        Hide
        Hemanth Yamijala added a comment -

        Functionality works well with the patch. Also code looks good.

        One minor point is the test cases. The current tests are using ls to check output. The output of these programs could be environment specific, and the tests could fail because of that.

        For e.g. I verified on FreeBSD that when ls fails, what comes out on stderr is something like
        ls: <filename>: No such file or directory,

        whereas on Linux it says:
        /bin/ls: <filename>: No such file or directory.

        So, I think we should have tests that aren't dependent on platforms like this.

        Other than this, +1

        Show
        Hemanth Yamijala added a comment - Functionality works well with the patch. Also code looks good. One minor point is the test cases. The current tests are using ls to check output. The output of these programs could be environment specific, and the tests could fail because of that. For e.g. I verified on FreeBSD that when ls fails, what comes out on stderr is something like ls: <filename>: No such file or directory, whereas on Linux it says: /bin/ls: <filename>: No such file or directory. So, I think we should have tests that aren't dependent on platforms like this. Other than this, +1
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Added a helper program that prints sample text to stderr, and changed the test-case testRedirectedStderr to call this helper program instead.

        Show
        Vinod Kumar Vavilapalli added a comment - Added a helper program that prints sample text to stderr, and changed the test-case testRedirectedStderr to call this helper program instead.
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-2947.3 [ 12379006 ]
        Vinod Kumar Vavilapalli made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hemanth Yamijala added a comment -
        • The patch assumes current working directory while referring to the helper program. This will not work
        • Can we also change the stdout test to use the helper program. I like having the expected output of the test under our complete control. For e.g. say ls is aliased to something in some environment, this test would break again.
        Show
        Hemanth Yamijala added a comment - The patch assumes current working directory while referring to the helper program. This will not work Can we also change the stdout test to use the helper program. I like having the expected output of the test under our complete control. For e.g. say ls is aliased to something in some environment, this test would break again.
        Vinod Kumar Vavilapalli made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Incorporating the suggested changes.

        Show
        Vinod Kumar Vavilapalli added a comment - Incorporating the suggested changes.
        Vinod Kumar Vavilapalli made changes -
        Attachment HADOOP-2947.4 [ 12379020 ]
        Vinod Kumar Vavilapalli made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hemanth Yamijala added a comment -

        +1 for the changes.

        Show
        Hemanth Yamijala added a comment - +1 for the changes.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379020/HADOOP-2947.4
        against trunk revision 643282.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 7 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379020/HADOOP-2947.4 against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 7 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2112/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Vinod !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Vinod !
        Hemanth Yamijala made changes -
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hemanth Yamijala made changes -
        Hadoop Flags [Reviewed]
        Release Note The stdout and stderr streams of daemons are redirected to files that are created under the hadoop log directory. Users can now send kill 3 signals to the daemons to get stack traces and thread dumps for debugging.
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        12d 22h 7m 2 Vinod Kumar Vavilapalli 01/Apr/08 10:36
        Open Open Patch Available Patch Available
        13d 3h 7m 3 Vinod Kumar Vavilapalli 01/Apr/08 10:40
        Patch Available Patch Available Resolved Resolved
        19h 36m 1 Hemanth Yamijala 02/Apr/08 06:17
        Resolved Resolved Closed Closed
        49d 14h 48m 1 Nigel Daley 21/May/08 21:05

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development