Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-779

Add node health failures into JobTrackerStatistics

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Add the node health failure counts into JobTrackerStatistics.

      1. mapreduce-779-1.patch
        7 kB
        Sreekanth Ramakrishnan
      2. mapreduce-779-2.patch
        8 kB
        Sreekanth Ramakrishnan
      3. mapreduce-779-3.patch
        8 kB
        Sreekanth Ramakrishnan
      4. mapreduce-779-4.patch
        8 kB
        Sreekanth Ramakrishnan

        Activity

        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch which adds this feature.

        Added a new TaskTrackerStat which maintains the count of number of time node health script failed.

        Uses the default job tracker statistics window for update.

        Added a test case to check if the node health count is populated correct in statistics.

        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch which adds this feature. Added a new TaskTrackerStat which maintains the count of number of time node health script failed. Uses the default job tracker statistics window for update. Added a test case to check if the node health count is populated correct in statistics.
        Hide
        rahul k singh added a comment -

        Some comments:

        1.comment the updateNodeHealthFailureStatistics method esp UN_HEALTHY stuff.
        2.above method declaration is more than 80 characters

        3.make sure things are 80 characters in jsp file.
        4.show the blacklisted metrics all the time

        Show
        rahul k singh added a comment - Some comments: 1.comment the updateNodeHealthFailureStatistics method esp UN_HEALTHY stuff. 2.above method declaration is more than 80 characters 3.make sure things are 80 characters in jsp file. 4.show the blacklisted metrics all the time
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch which addresses Rahul's comments.

        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch which addresses Rahul's comments.
        Hide
        rahul k singh added a comment -

        changes look fine to me
        +1.

        Show
        rahul k singh added a comment - changes look fine to me +1.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Output from ant test-patch:

             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Sreekanth Ramakrishnan added a comment - Output from ant test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Hemanth Yamijala added a comment -

        The attached patch is not applying for the file machines.jsp. Can you please upload a new one ?

        Show
        Hemanth Yamijala added a comment - The attached patch is not applying for the file machines.jsp. Can you please upload a new one ?
        Hide
        Sreekanth Ramakrishnan added a comment -

        Latest patch with changes merged in machines.jsp

        Show
        Sreekanth Ramakrishnan added a comment - Latest patch with changes merged in machines.jsp
        Hide
        Hemanth Yamijala added a comment -

        I looked at the patch. Overall, looking good.

        Sharad and I feel that the method JobTrackerStatistics.taskTrackerHealthCheckFailed() is not required. I think it is assumed that the tasktracker stat will be created and available. Further, it seems like the class JobTrackerStatistics manages the TasktrackerStatistics instances and nothing more. So, the method seems out of place. Hence, wherever we need to update the health check failure count, we could simply call JobTrackerStatistics.getTaskTrackerStat().incrHealthCheckFailed(). Sreekanth, any reason you thought the TaskTrackerStat for a TT will not be available and hence were implicitly managing the creation of it ?

        One more minor nit is we can rename the label "Blacklisted due to health Failures" to "Failed Health Checks" just to reduce the verbosity.

        Other than this +1

        Show
        Hemanth Yamijala added a comment - I looked at the patch. Overall, looking good. Sharad and I feel that the method JobTrackerStatistics.taskTrackerHealthCheckFailed() is not required. I think it is assumed that the tasktracker stat will be created and available. Further, it seems like the class JobTrackerStatistics manages the TasktrackerStatistics instances and nothing more. So, the method seems out of place. Hence, wherever we need to update the health check failure count, we could simply call JobTrackerStatistics.getTaskTrackerStat().incrHealthCheckFailed(). Sreekanth, any reason you thought the TaskTrackerStat for a TT will not be available and hence were implicitly managing the creation of it ? One more minor nit is we can rename the label "Blacklisted due to health Failures" to "Failed Health Checks" just to reduce the verbosity. Other than this +1
        Hide
        Sreekanth Ramakrishnan added a comment -

        Had added that method assuming that TaskTrackerStat is not created until first task is run/failed in a lazy manner, now rechecked that the stat object is created when tracker is added. So I have removed the method.

        Renamed the heading in JSP.

        Also added TestTaskTrackerBlacklisting to commit-tests list.

        Show
        Sreekanth Ramakrishnan added a comment - Had added that method assuming that TaskTrackerStat is not created until first task is run/failed in a lazy manner, now rechecked that the stat object is created when tracker is added. So I have removed the method. Renamed the heading in JSP. Also added TestTaskTrackerBlacklisting to commit-tests list.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12415816/mapreduce-779-4.patch
        against trunk revision 801959.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 10 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/457/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/457/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/457/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/457/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12415816/mapreduce-779-4.patch against trunk revision 801959. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/457/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/457/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/457/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/457/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        The streaming test failures on contrib has been going on for a while, and I think is a hudson only issue. Going to commit this patch.

        Show
        Hemanth Yamijala added a comment - The streaming test failures on contrib has been going on for a while, and I think is a hudson only issue. Going to commit this patch.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this to trunk. Thanks, Sreekanth !

        Show
        Hemanth Yamijala added a comment - I just committed this to trunk. Thanks, Sreekanth !
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #46 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #46 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/ )

          People

          • Assignee:
            Sreekanth Ramakrishnan
            Reporter:
            Sreekanth Ramakrishnan
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development