Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.2, 2.0.0-alpha
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: mrv2
    • Labels:
      None

      Description

      The AM will timeout a task through mapreduce.task.timeout only when it does not hear from the task within the given timeframe. On 1.0 a task must be making progress, either by reading input from HDFS, writing output to HDFS, writing to a log, or calling a special method to inform it that it is still making progress.

      This is because on 0.23 a status update which happens every 3 seconds is counted as progress.

      1. MR-4089.txt
        13 kB
        Robert Joseph Evans
      2. MR-4089.txt
        30 kB
        Robert Joseph Evans
      3. MR-4089.txt
        12 kB
        Robert Joseph Evans
      4. MR-4089.txt
        12 kB
        Robert Joseph Evans
      5. MR-4089.txt
        5 kB
        Robert Joseph Evans

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        2d 19h 51m 2 Robert Joseph Evans 02/Apr/12 16:22
        Open Open Patch Available Patch Available
        1h 32m 3 Robert Joseph Evans 02/Apr/12 16:31
        Patch Available Patch Available Resolved Resolved
        5h 1 Thomas Graves 02/Apr/12 21:31
        Resolved Resolved Closed Closed
        191d 21h 17m 1 Arun C Murthy 11/Oct/12 18:48
        Allen Wittenauer made changes -
        Affects Version/s trunk [ 12320360 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Arun C Murthy made changes -
        Fix Version/s 2.0.2-alpha [ 12322471 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1039 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/)
        MAPREDUCE-4089. Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531)

        Result = FAILURE
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1039 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1039/ ) MAPREDUCE-4089 . Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1004 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/)
        MAPREDUCE-4089. Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531)

        Result = FAILURE
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1004 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1004/ ) MAPREDUCE-4089 . Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #217 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/217/)
        merge -r 1308532:1308533 from branch-2. FIXES: MAPREDUCE-4089 (Revision 1308537)

        Result = UNSTABLE
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308537
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #217 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/217/ ) merge -r 1308532:1308533 from branch-2. FIXES: MAPREDUCE-4089 (Revision 1308537) Result = UNSTABLE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308537 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12521018/MR-4089.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.mapred.TestMiniMRClientCluster
        org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
        org.apache.hadoop.mapred.TestJobCounters
        org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
        org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
        org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
        org.apache.hadoop.mapreduce.v2.TestRMNMInfo
        org.apache.hadoop.mapreduce.security.TestJHSSecurity
        org.apache.hadoop.mapreduce.v2.TestUberAM
        org.apache.hadoop.mapred.TestReduceFetch
        org.apache.hadoop.mapreduce.TestChild
        org.apache.hadoop.mapred.TestLazyOutput
        org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
        org.apache.hadoop.mapreduce.v2.TestMRJobs
        org.apache.hadoop.mapred.TestJobSysDirWithDFS
        org.apache.hadoop.mapred.TestMiniMRBringup
        org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
        org.apache.hadoop.mapred.TestJobCleanup
        org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
        org.apache.hadoop.conf.TestNoDefaultsJobConf
        org.apache.hadoop.mapred.TestMiniMRChildTask
        org.apache.hadoop.mapred.TestClientRedirect

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2129//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2129//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12521018/MR-4089.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapreduce.security.TestJHSSecurity org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapred.TestJobSysDirWithDFS org.apache.hadoop.mapred.TestMiniMRBringup org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapred.TestClientRedirect +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2129//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2129//console This message is automatically generated.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #1985 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1985/)
        MAPREDUCE-4089. Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1985 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1985/ ) MAPREDUCE-4089 . Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2047 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2047/)
        MAPREDUCE-4089. Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2047 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2047/ ) MAPREDUCE-4089 . Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #1972 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1972/)
        MAPREDUCE-4089. Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1972 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1972/ ) MAPREDUCE-4089 . Hung Tasks never time out. (Robert Evans via tgraves) (Revision 1308531) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1308531 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        Thomas Graves made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Target Version/s trunk, 2.0.0, 0.23.2 [ 12320360, 12320354, 12319851 ] 0.23.2, 2.0.0, trunk [ 12319851, 12320354, 12320360 ]
        Fix Version/s 0.23.3 [ 12320060 ]
        Resolution Fixed [ 1 ]
        Hide
        Thomas Graves added a comment -

        +1, thanks Bobby! I committed this to trunk, branch-2, and branch-0.23

        Show
        Thomas Graves added a comment - +1, thanks Bobby! I committed this to trunk, branch-2, and branch-0.23
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Not sure we can do anything else if we want to be compatible in this sense. Removing the ping thread completely from the framework is one option, but may be later.

        Show
        Vinod Kumar Vavilapalli added a comment - Not sure we can do anything else if we want to be compatible in this sense. Removing the ping thread completely from the framework is one option, but may be later.
        Robert Joseph Evans made changes -
        Attachment MR-4089.txt [ 12521018 ]
        Hide
        Robert Joseph Evans added a comment -

        Sorry messed that patch up a bit, trying again...

        Show
        Robert Joseph Evans added a comment - Sorry messed that patch up a bit, trying again...
        Robert Joseph Evans made changes -
        Attachment MR-4089.txt [ 12521014 ]
        Hide
        Robert Joseph Evans added a comment -

        I thought it would be good to leave the ping timeout in just to be sure that if someone did disable the progress timeout with a 0 we could still detect a task that has something bad with it, but I can see your point.

        I updated the mapred-default.xml to include information about a value of 0. This is also to maintain compatability with 1.0.

        Show
        Robert Joseph Evans added a comment - I thought it would be good to leave the ping timeout in just to be sure that if someone did disable the progress timeout with a 0 we could still detect a task that has something bad with it, but I can see your point. I updated the mapred-default.xml to include information about a value of 0. This is also to maintain compatability with 1.0.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        In a sense, after this patch, we are again forcing users to set progress every so often either via a output write or an explict progress (perhaps in a separate thread). If users have to do it anyways, I don't see the purpose of the backgrounded (framework's) progress thread (that pings every three seconds).

        But if it is just to be compatible, sure.

        Either case, you should document the special meaning of zero timeout.

        Show
        Vinod Kumar Vavilapalli added a comment - In a sense, after this patch, we are again forcing users to set progress every so often either via a output write or an explict progress (perhaps in a separate thread). If users have to do it anyways, I don't see the purpose of the backgrounded (framework's) progress thread (that pings every three seconds). But if it is just to be compatible, sure. Either case, you should document the special meaning of zero timeout.
        Hide
        Thomas Graves added a comment -

        +1 Looks good, I'll commit this shortly, Thanks Bobby!

        Show
        Thomas Graves added a comment - +1 Looks good, I'll commit this shortly, Thanks Bobby!
        Hide
        Robert Joseph Evans added a comment -

        The test failures appear to be caused by either existing issues or Bind errors, just like before. All of the failures are in the RM, which has no dependency on the code that was changed with this patch.

        Show
        Robert Joseph Evans added a comment - The test failures appear to be caused by either existing issues or Bind errors, just like before. All of the failures are in the RM, which has no dependency on the code that was changed with this patch.
        Hide
        Jason Lowe added a comment -

        +1 (non-binding), looks good to me.

        Show
        Jason Lowe added a comment - +1 (non-binding), looks good to me.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12520981/MR-4089.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
        org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry
        org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
        org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2125//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2125//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520981/MR-4089.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2125//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2125//console This message is automatically generated.
        Robert Joseph Evans made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s trunk, 2.0.0, 0.23.2 [ 12320360, 12320354, 12319851 ] 0.23.2, 2.0.0, trunk [ 12319851, 12320354, 12320360 ]
        Robert Joseph Evans made changes -
        Attachment MR-4089.txt [ 12520981 ]
        Hide
        Robert Joseph Evans added a comment -

        Fixed the javac warnings and added the missing licence to the new test file.

        Show
        Robert Joseph Evans added a comment - Fixed the javac warnings and added the missing licence to the new test file.
        Robert Joseph Evans made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Target Version/s trunk, 2.0.0, 0.23.2 [ 12320360, 12320354, 12319851 ] 0.23.2, 2.0.0, trunk [ 12319851, 12320354, 12320360 ]
        Hide
        Robert Joseph Evans added a comment -

        The javac warning is mine

        [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java:[44,42] [unchecked] unchecked call to handle(T) as a member of the raw type org.apache.hadoop.yarn.event.EventHandler
        

        But the test failures are not. They all appear to be caused by address already in use errors. It looks like some process refused to die.

        Show
        Robert Joseph Evans added a comment - The javac warning is mine [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestTaskHeartbeatHandler.java:[44,42] [unchecked] unchecked call to handle(T) as a member of the raw type org.apache.hadoop.yarn.event.EventHandler But the test failures are not. They all appear to be caused by address already in use errors. It looks like some process refused to die.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12520679/MR-4089.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The applied patch generated 508 javac compiler warnings (more than the trunk's current 507 warnings).

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.mapred.TestMiniMRClientCluster
        org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
        org.apache.hadoop.mapred.TestJobCounters
        org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
        org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
        org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
        org.apache.hadoop.mapreduce.v2.TestRMNMInfo
        org.apache.hadoop.mapreduce.security.TestJHSSecurity
        org.apache.hadoop.mapreduce.v2.TestUberAM
        org.apache.hadoop.mapred.TestReduceFetch
        org.apache.hadoop.mapreduce.TestChild
        org.apache.hadoop.mapred.TestLazyOutput
        org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
        org.apache.hadoop.mapreduce.v2.TestMRJobs
        org.apache.hadoop.mapred.TestJobSysDirWithDFS
        org.apache.hadoop.mapred.TestMiniMRBringup
        org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
        org.apache.hadoop.mapred.TestJobCleanup
        org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
        org.apache.hadoop.conf.TestNoDefaultsJobConf
        org.apache.hadoop.mapred.TestMiniMRChildTask
        org.apache.hadoop.mapred.TestClientRedirect

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2121//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2121//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520679/MR-4089.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 508 javac compiler warnings (more than the trunk's current 507 warnings). +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapreduce.security.TestJHSSecurity org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapred.TestJobSysDirWithDFS org.apache.hadoop.mapred.TestMiniMRBringup org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapred.TestClientRedirect +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2121//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2121//console This message is automatically generated.
        Robert Joseph Evans made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s trunk, 2.0.0, 0.23.2 [ 12320360, 12320354, 12319851 ] 0.23.2, 2.0.0, trunk [ 12319851, 12320354, 12320360 ]
        Robert Joseph Evans made changes -
        Attachment MR-4089.txt [ 12520679 ]
        Hide
        Robert Joseph Evans added a comment -

        This patch addresses, a ping timeout, progress timeout, and a timeout of 0.

        Show
        Robert Joseph Evans added a comment - This patch addresses, a ping timeout, progress timeout, and a timeout of 0.
        Robert Joseph Evans made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Target Version/s trunk, 2.0.0, 0.23.2 [ 12320360, 12320354, 12319851 ] 0.23.2, 2.0.0, trunk [ 12319851, 12320354, 12320360 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12520653/MR-4089.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2119//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2119//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520653/MR-4089.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2119//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2119//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        OK Talking to people here it looks like there are a significant number of users that I know of that set the timeout to 0, so I am going to come up with a new patch to have the 0 timeout be acceptable too.

        Show
        Robert Joseph Evans added a comment - OK Talking to people here it looks like there are a significant number of users that I know of that set the timeout to 0, so I am going to come up with a new patch to have the 0 timeout be acceptable too.
        Robert Joseph Evans made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s trunk, 2.0.0, 0.23.2 [ 12320360, 12320354, 12319851 ] 0.23.2, 2.0.0, trunk [ 12319851, 12320354, 12320360 ]
        Robert Joseph Evans made changes -
        Field Original Value New Value
        Attachment MR-4089.txt [ 12520653 ]
        Hide
        Robert Joseph Evans added a comment -

        This patch addresses the timeout issue, and it does so by making ping not update progress. It is still not completely compatible with 1.0, as in 1.0 if the timeout is set to 0 the task will never timeout. But because this patch makes it so ping is ignored a task that has a timeout of 0, but is so locked up that it cannot ping anymore will never timeout.

        I am planning to address these in a follow on JIRA, unless someone has some objections to doing so.

        I also have not run all of the unit tests yet.

        Show
        Robert Joseph Evans added a comment - This patch addresses the timeout issue, and it does so by making ping not update progress. It is still not completely compatible with 1.0, as in 1.0 if the timeout is set to 0 the task will never timeout. But because this patch makes it so ping is ignored a task that has a timeout of 0, but is so locked up that it cannot ping anymore will never timeout. I am planning to address these in a follow on JIRA, unless someone has some objections to doing so. I also have not run all of the unit tests yet.
        Hide
        Robert Joseph Evans added a comment -

        From looking at the code for Task. When TaskReporter.progress() is called, all it does is set the progress flag, which causes a progress update to be sent to the AM instead of a ping to be sent. So I would guess is that we want to stop counting pings when measuring progress in the AM. So it should probably be a simple 1 line change.

        Show
        Robert Joseph Evans added a comment - From looking at the code for Task. When TaskReporter.progress() is called, all it does is set the progress flag, which causes a progress update to be sent to the AM instead of a ping to be sent. So I would guess is that we want to stop counting pings when measuring progress in the AM. So it should probably be a simple 1 line change.
        Robert Joseph Evans created issue -

          People

          • Assignee:
            Robert Joseph Evans
            Reporter:
            Robert Joseph Evans
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development