Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5689

MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.2.0
    • Fix Version/s: 0.23.11, 2.3.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We saw corner case where Jobs running on cluster were hung. Scenario was something like this. Job was running within a pool which was running at its capacity. All available containers were occupied by reducers and last 2 mappers. There were few more reducers waiting to be scheduled in pipeline.
      At this point two mappers which were running failed and went back to scheduled state. two available containers were assigned to reducers, now whole pool was full of reducers waiting on two maps to be complete. 2 maps never got scheduled because pool was full.

      Ideally reducer preemption should have kicked in to make room for Mappers from this code in RMContaienrAllocator

      int completedMaps = getJob().getCompletedMaps();
          int completedTasks = completedMaps + getJob().getCompletedReduces();
          if (lastCompletedTasks != completedTasks) {
            lastCompletedTasks = completedTasks;
            recalculateReduceSchedule = true;
          }
      
          if (recalculateReduceSchedule) {
            preemptReducesIfNeeded();
      

      But in this scenario lastCompletedTasks is always completedTasks because maps were never completed. This would cause job to hang forever. As workaround if we kill few reducers, mappers would get scheduled and caused job to complete.

      1. MAPREDUCE-5689.2.patch
        3 kB
        Karthik Kambatla
      2. MAPREDUCE-5689.1.patch
        3 kB
        Lohit Vijayarenu

        Activity

        Hide
        Jason Lowe added a comment -

        Thanks Lohit and Karthik! I pulled this into branch-0.23 as well.

        Show
        Jason Lowe added a comment - Thanks Lohit and Karthik! I pulled this into branch-0.23 as well.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1659 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1659/)
        MAPREDUCE-5689. MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled. (lohit via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1555161)

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1659 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1659/ ) MAPREDUCE-5689 . MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled. (lohit via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1555161 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1634 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1634/)
        MAPREDUCE-5689. MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled. (lohit via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1555161)

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1634 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1634/ ) MAPREDUCE-5689 . MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled. (lohit via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1555161 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #442 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/442/)
        MAPREDUCE-5689. MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled. (lohit via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1555161)

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #442 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/442/ ) MAPREDUCE-5689 . MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled. (lohit via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1555161 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #4954 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4954/)
        MAPREDUCE-5689. MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled. (lohit via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1555161)

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #4954 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4954/ ) MAPREDUCE-5689 . MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled. (lohit via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1555161 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
        Hide
        Karthik Kambatla (Inactive) added a comment -

        Thanks Lohit. Committed this to trunk and branch-2.

        Show
        Karthik Kambatla (Inactive) added a comment - Thanks Lohit. Committed this to trunk and branch-2.
        Hide
        Karthik Kambatla (Inactive) added a comment -

        +1. Will commit this shortly.

        Show
        Karthik Kambatla (Inactive) added a comment - +1. Will commit this shortly.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12621159/MAPREDUCE-5689.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4297//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4297//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621159/MAPREDUCE-5689.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4297//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4297//console This message is automatically generated.
        Hide
        Karthik Kambatla (Inactive) added a comment -

        Patch looks good to me. Uploading the same patch with the --no-prefix option.

        Show
        Karthik Kambatla (Inactive) added a comment - Patch looks good to me. Uploading the same patch with the --no-prefix option.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12619367/MAPREDUCE-5689.1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4273//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4273//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619367/MAPREDUCE-5689.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4273//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4273//console This message is automatically generated.
        Hide
        Lohit Vijayarenu added a comment -

        Attached is patch to make sure reducer schedule is forced to re-calculate if there are pending maps.

        Show
        Lohit Vijayarenu added a comment - Attached is patch to make sure reducer schedule is forced to re-calculate if there are pending maps.

          People

          • Assignee:
            Lohit Vijayarenu
            Reporter:
            Lohit Vijayarenu
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development