Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 0.20.204.0, 0.23.0
    • Fix Version/s: 0.20.204.0, 0.23.0
    • Component/s: jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The information on why a specific TaskTracker got blacklisted is not stored anywhere. The jobtracker web ui will show the detailed reason string until the TT gets unblacklisted. After that it is lost.

      1. MAPREDUCE-2451-trunk.patch
        1 kB
        Thomas Graves
      2. MAPREDUCE-2451-trunk.patch
        1 kB
        Thomas Graves
      3. MAPREDUCE-2451-20.205.patch
        1 kB
        Thomas Graves

        Activity

        Thomas Graves created issue -
        Hide
        Thomas Graves added a comment -

        The jobtracker could log the detailed reason string in blacklistTracker when it logs the task tracker change to blacklisted/graylisted state. Currently it logs the shade (black or gray) and the ReasonForBlackListing. The reason string is already passed into the function so we can just add it to the log message calls.

        log message now:
        INFO org.apache.hadoop.mapred.JobTracker: Adding new blacklisted tracker : XXX Reason for blacklisting is : NODE_UNHEALTHY

        Would become something like:
        INFO org.apache.hadoop.mapred.JobTracker: Adding new blacklisted tracker : XXX Reason for blacklisting is : NODE_UNHEALTHY Reason details : Error string returned from healthChecker.script

        Show
        Thomas Graves added a comment - The jobtracker could log the detailed reason string in blacklistTracker when it logs the task tracker change to blacklisted/graylisted state. Currently it logs the shade (black or gray) and the ReasonForBlackListing. The reason string is already passed into the function so we can just add it to the log message calls. log message now: INFO org.apache.hadoop.mapred.JobTracker: Adding new blacklisted tracker : XXX Reason for blacklisting is : NODE_UNHEALTHY Would become something like: INFO org.apache.hadoop.mapred.JobTracker: Adding new blacklisted tracker : XXX Reason for blacklisting is : NODE_UNHEALTHY Reason details : Error string returned from healthChecker.script
        Thomas Graves made changes -
        Field Original Value New Value
        Fix Version/s 0.20.205.0 [ 12316391 ]
        Thomas Graves made changes -
        Assignee Thomas Graves [ tgraves ]
        Thomas Graves made changes -
        Attachment MAPREDUCE-2451-20.205.patch [ 12478160 ]
        Attachment MAPREDUCE-2451-trunk.patch [ 12478161 ]
        Hide
        Thomas Graves added a comment -

        For the 20.205 branch these are based on branch-0.20-security here are the results of test-patch. The -1 for javadoc and eclipse both exists in the branch-20-security without my change.

        I didn't include any tests because this is a trivial change to the log message. Manual test steps are below.

        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
        [exec] Please justify why no tests are needed for this patch.
        [exec]
        [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec]
        [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories. [exec] [exec]
        [exec]
        [exec]

        Manual test steps:

        • modify mapred_site.xml to have the mapred.healthChecker.script.path configuration. Have it point to a script like ~/health_check.
        • modify ~/health_check to contain just something like:
          #!/bin/bash
          exit 0
        • start the cluster and make sure every is running fine.
        • modify the ~/health_check script on a tasktracker and insert the a line like: echo -n "ERROR new string" before the exit 0 line.
        • wait for tasktracker to send heartbeat back with updated health info
        • look in the jobtracker log file and verify the log line looks similar to this. This bug added the "Reason details : ERROR new string" bit.
          2011-05-04 13:30:31,926 INFO org.apache.hadoop.mapred.JobTracker: Blacklisting tracker : yourhost.com Reason for blacklisting is : NODE_UNHEALTHY Reason details : ERROR new string
        • also verify the tasktracker got blacklisted.
        Show
        Thomas Graves added a comment - For the 20.205 branch these are based on branch-0.20-security here are the results of test-patch. The -1 for javadoc and eclipse both exists in the branch-20-security without my change. I didn't include any tests because this is a trivial change to the log message. Manual test steps are below. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories. [exec] [exec] [exec] [exec] Manual test steps: modify mapred_site.xml to have the mapred.healthChecker.script.path configuration. Have it point to a script like ~/health_check. modify ~/health_check to contain just something like: #!/bin/bash exit 0 start the cluster and make sure every is running fine. modify the ~/health_check script on a tasktracker and insert the a line like: echo -n "ERROR new string" before the exit 0 line. wait for tasktracker to send heartbeat back with updated health info look in the jobtracker log file and verify the log line looks similar to this. This bug added the "Reason details : ERROR new string" bit. 2011-05-04 13:30:31,926 INFO org.apache.hadoop.mapred.JobTracker: Blacklisting tracker : yourhost.com Reason for blacklisting is : NODE_UNHEALTHY Reason details : ERROR new string also verify the tasktracker got blacklisted.
        Thomas Graves made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Thomas Graves made changes -
        Status In Progress [ 3 ] Patch Available [ 10002 ]
        Affects Version/s 0.23.0 [ 12315570 ]
        Hide
        Thomas Graves added a comment -

        Attaching trunk patch again to see if hudson build will start

        Show
        Thomas Graves added a comment - Attaching trunk patch again to see if hudson build will start
        Thomas Graves made changes -
        Attachment MAPREDUCE-2451-trunk.patch [ 12478596 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12478596/MAPREDUCE-2451-trunk.patch
        against trunk revision 1100409.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.mapred.TestDebugScript

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/225//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/225//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/225//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478596/MAPREDUCE-2451-trunk.patch against trunk revision 1100409. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.mapred.TestDebugScript +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/225//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/225//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/225//console This message is automatically generated.
        Hide
        Thomas Graves added a comment -

        The TestDebugScript test runs fine when I run it manually:

        [junit] Running org.apache.hadoop.mapred.TestDebugScript
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.571 sec

        Show
        Thomas Graves added a comment - The TestDebugScript test runs fine when I run it manually: [junit] Running org.apache.hadoop.mapred.TestDebugScript [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.571 sec
        Hide
        Chris Douglas added a comment -

        +1

        I committed this. Thanks, Tom!

        Show
        Chris Douglas added a comment - +1 I committed this. Thanks, Tom!
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #660 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/660/)
        MAPREDUCE-2451. Log the details from health check script at the
        JobTracker. Contributed by Thomas Graves

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #660 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/660/ ) MAPREDUCE-2451 . Log the details from health check script at the JobTracker. Contributed by Thomas Graves
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #675 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/675/)
        MAPREDUCE-2451. Log the details from health check script at the
        JobTracker. Contributed by Thomas Graves

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #675 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/675/ ) MAPREDUCE-2451 . Log the details from health check script at the JobTracker. Contributed by Thomas Graves
        Owen O'Malley made changes -
        Fix Version/s 0.20.204.0 [ 12316318 ]
        Owen O'Malley made changes -
        Fix Version/s 0.20.205.0 [ 12316391 ]
        Hide
        Owen O'Malley added a comment -

        Hadoop 0.20.204.0 was just released.

        Show
        Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.
        Owen O'Malley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Thomas Graves
            Reporter:
            Thomas Graves
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development