Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4685

Disable AM blacklisting by default to mitigate situations that application get hanged

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: resourcemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      AM blacklist addition or removal is updated only when RMAppAttempt is scheduled i.e RMAppAttemptImpl#ScheduleTransition#transition. But once attempt is scheduled if there is any removeNode/addNode in cluster then this is not updated to BlackListManager#refreshNodeHostCount. This leads BlackListManager to operate on stale NM's count. And application is in ACCEPTED state and wait forever even if blacklisted nodes are reconnected with clearing disk space.

      1. YARN-4685.patch
        1 kB
        Rohith Sharma K S
      2. YARN-4685-workaround.patch
        0.9 kB
        Rohith Sharma K S

        Issue Links

          Activity

          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Currently RMAppAttemptImpl start calling allocate method only when CONTAINER_ALLOCATED event triggered. But If container is not allocated then RMAppAttemptImpl will not call allocate method continuously. So even if we add code for sending updated blacklist addition/removal in RMAppAttemptImpl#AMContainerAllocatedTransition#transition that does not useful. Need to think alternatives to handle this scenario

          Show
          rohithsharma Rohith Sharma K S added a comment - Currently RMAppAttemptImpl start calling allocate method only when CONTAINER_ALLOCATED event triggered. But If container is not allocated then RMAppAttemptImpl will not call allocate method continuously. So even if we add code for sending updated blacklist addition/removal in RMAppAttemptImpl#AMContainerAllocatedTransition#transition that does not useful. Need to think alternatives to handle this scenario
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          One of the case where application got stuck is

          1. Cluster started with 2 Node initially and submitted 1 application.
          2. Attempt 1 is failed with disk failed in NM-1. Attempt-2 got created making NM-1 as blacklisted node.
          3. NM-2 got removed from cluster. Only NM-1 is in cluster.
          4. Since NM-1 is blacklisted, no more containers are assigned to NM-1.
          5. In cluster only 1 node is there and that too blacklisted, so no more container are assigning to NM-1 even after Node NM-1 is reconnected after removing disk space.
          Show
          rohithsharma Rohith Sharma K S added a comment - One of the case where application got stuck is Cluster started with 2 Node initially and submitted 1 application. Attempt 1 is failed with disk failed in NM-1. Attempt-2 got created making NM-1 as blacklisted node. NM-2 got removed from cluster. Only NM-1 is in cluster. Since NM-1 is blacklisted, no more containers are assigned to NM-1. In cluster only 1 node is there and that too blacklisted, so no more container are assigning to NM-1 even after Node NM-1 is reconnected after removing disk space.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          There are simpler cases which are busted too. For e.g, if an AM failed on a node, this node will never be looked again for launching this app's AM as it is within the blacklist threshold. In a busy cluster where this node continues to be the only one free for a while, we will keep on skipping the machine.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - There are simpler cases which are busted too. For e.g, if an AM failed on a node, this node will never be looked again for launching this app's AM as it is within the blacklist threshold. In a busy cluster where this node continues to be the only one free for a while, we will keep on skipping the machine.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Initially thought to fix by calling another allocate call when ever there is node update event to
          RMApp->RMAppImpl. But there could be case where newly allocate call get the master container before RMAppAttemptImpl gets container allocated event. In such case, RMAppAttemptImpl should have handling mechanism. Like this many cases can occur. This option does not work.

          Other approaches fixing this issue are recompute blacklist threshold EITHER for on node-added && node-remove event OR on every heartbeat for the ALL apps which are waiting for AM container allocation and update appschedulinginfo for amBlacklist

          Show
          rohithsharma Rohith Sharma K S added a comment - Initially thought to fix by calling another allocate call when ever there is node update event to RMApp->RMAppImpl . But there could be case where newly allocate call get the master container before RMAppAttemptImpl gets container allocated event. In such case, RMAppAttemptImpl should have handling mechanism. Like this many cases can occur. This option does not work. Other approaches fixing this issue are recompute blacklist threshold EITHER for on node-added && node-remove event OR on every heartbeat for the ALL apps which are waiting for AM container allocation and update appschedulinginfo for amBlacklist
          Hide
          sunilg Sunil G added a comment -

          Agreeing to your point Rohith Sharma K S.

          We have blacklistManager per RMAppAttempt. So to operate anything on blacklistManager, we have to pass reference to scheduler. Assuming I am interested in your second approach. In Each heartbeat call, we will check for pending AM container resource request. Then for such resource request, re-compute blacklist threshold if needed (which means if some nodes are added/removed recently) in blacklistManager. If there are some changes in threshold, remove blacklist for this ResourceRequest.

          But we need to change lot of interface api syntax. If we had a common BlackListManager, which keeps tracks of all blacklist information for all apps, it would have been more clean.

          Show
          sunilg Sunil G added a comment - Agreeing to your point Rohith Sharma K S . We have blacklistManager per RMAppAttempt . So to operate anything on blacklistManager , we have to pass reference to scheduler. Assuming I am interested in your second approach. In Each heartbeat call, we will check for pending AM container resource request. Then for such resource request, re-compute blacklist threshold if needed (which means if some nodes are added/removed recently) in blacklistManager . If there are some changes in threshold, remove blacklist for this ResourceRequest. But we need to change lot of interface api syntax. If we had a common BlackListManager, which keeps tracks of all blacklist information for all apps, it would have been more clean.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Some of the points brought in offline discussion with Sunil G and Varun Vasudev are

          1. The default value for maximum threshold value is 0.8. This should be reduced to 0.1 i.e 10% OR 0.2 i.e 20%. As Vinod commented previously in this JIRA, In real production cluster, blacklisting 80% of nodes for one app is very prone to be problematic if 20% of nodes are always busy.
          2. Once attempt is scheduled, there is no way to update scheduler for updated blacklist add/remove. Since the existing API allocate is used for updating blacklisted nodes for AM, using same API for update AM blacklist add/removal nodes from RMAppAttempt is critical. Lot of RMAppAttempt state machines need to be handled since allocate API return Allocation object, lot of race conditions would appear. In order to update scheduler for blacklisting nodes is triggering an update event from RMAppAttempt for AM blacklisting nodes. This make sure YarnScheduler interface is compatible.
          Show
          rohithsharma Rohith Sharma K S added a comment - Some of the points brought in offline discussion with Sunil G and Varun Vasudev are The default value for maximum threshold value is 0.8. This should be reduced to 0.1 i.e 10% OR 0.2 i.e 20%. As Vinod commented previously in this JIRA, In real production cluster, blacklisting 80% of nodes for one app is very prone to be problematic if 20% of nodes are always busy. Once attempt is scheduled, there is no way to update scheduler for updated blacklist add/remove. Since the existing API allocate is used for updating blacklisted nodes for AM, using same API for update AM blacklist add/removal nodes from RMAppAttempt is critical. Lot of RMAppAttempt state machines need to be handled since allocate API return Allocation object, lot of race conditions would appear. In order to update scheduler for blacklisting nodes is triggering an update event from RMAppAttempt for AM blacklisting nodes. This make sure YarnScheduler interface is compatible.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Rohith Sharma K S, it seems to me that there's no consensus about how to fix this problem yet, could we move this to 2.9?

          Show
          leftnoteasy Wangda Tan added a comment - Rohith Sharma K S , it seems to me that there's no consensus about how to fix this problem yet, could we move this to 2.9?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Since this issue is introduced by YARN-2005 and committed to branch-2.8, should YARN-2005 reverted as long as right solution is decided??. One biggest challenge to revert is many patches are committed on top of YARN-2005.
          OR should we go ahead with changing threshold to 0.2 default for 2.8 release? Any thoughts

          Show
          rohithsharma Rohith Sharma K S added a comment - Since this issue is introduced by YARN-2005 and committed to branch-2.8, should YARN-2005 reverted as long as right solution is decided??. One biggest challenge to revert is many patches are committed on top of YARN-2005 . OR should we go ahead with changing threshold to 0.2 default for 2.8 release? Any thoughts
          Hide
          leftnoteasy Wangda Tan added a comment -

          Rohith Sharma K S,

          Discussed with Vinod Kumar Vavilapalli about this, one solution is to update DEFAULT_AM_BLACKLIST_ENABLED to false, and update default threshold from .8 to .2. We can open a separate JIRA to have a longer term fix for this issue. Sounds like a plan?

          Show
          leftnoteasy Wangda Tan added a comment - Rohith Sharma K S , Discussed with Vinod Kumar Vavilapalli about this, one solution is to update DEFAULT_AM_BLACKLIST_ENABLED to false, and update default threshold from .8 to .2. We can open a separate JIRA to have a longer term fix for this issue. Sounds like a plan?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          OK. I will upload a patch with changes.

          Show
          rohithsharma Rohith Sharma K S added a comment - OK. I will upload a patch with changes.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Updated the patch with 2 changes in configurations.

          1. Reduced blacklisting threshold to 20%
          2. Default value for blacklist-enabled is set to false.
          Show
          rohithsharma Rohith Sharma K S added a comment - Updated the patch with 2 changes in configurations. Reduced blacklisting threshold to 20% Default value for blacklist-enabled is set to false.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 44s trunk passed
          +1 compile 0m 23s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 26s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 0s trunk passed
          +1 javadoc 0m 16s trunk passed
          +1 mvninstall 0m 21s the patch passed
          +1 compile 0m 20s the patch passed
          +1 javac 0m 20s the patch passed
          +1 checkstyle 0m 13s the patch passed
          +1 mvnsite 0m 23s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 4s the patch passed
          +1 javadoc 0m 14s the patch passed
          +1 unit 0m 22s hadoop-yarn-api in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          13m 38s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824510/YARN-4685.patch
          JIRA Issue YARN-4685
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5b685bd44176 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8179f9a
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12830/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/12830/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 44s trunk passed +1 compile 0m 23s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 26s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 0s trunk passed +1 javadoc 0m 16s trunk passed +1 mvninstall 0m 21s the patch passed +1 compile 0m 20s the patch passed +1 javac 0m 20s the patch passed +1 checkstyle 0m 13s the patch passed +1 mvnsite 0m 23s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 4s the patch passed +1 javadoc 0m 14s the patch passed +1 unit 0m 22s hadoop-yarn-api in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 13m 38s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824510/YARN-4685.patch JIRA Issue YARN-4685 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5b685bd44176 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8179f9a Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12830/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api Console output https://builds.apache.org/job/PreCommit-YARN-Build/12830/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          leftnoteasy Wangda Tan added a comment -

          +1 to latest patch, will commit shortly.

          Show
          leftnoteasy Wangda Tan added a comment - +1 to latest patch, will commit shortly.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10313 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10313/)
          YARN-4685. Disable AM blacklisting by default to mitigate situations (wangda: rev 2da32a6ef9edebd86ca9672d10ce35b5a46818cc)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10313 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10313/ ) YARN-4685 . Disable AM blacklisting by default to mitigate situations (wangda: rev 2da32a6ef9edebd86ca9672d10ce35b5a46818cc) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          Hide
          leftnoteasy Wangda Tan added a comment -

          Committed to trunk/branch-2/branch-2.8, thanks Rohith Sharma K S for the patch and thanks Sunil G for reviews!

          Show
          leftnoteasy Wangda Tan added a comment - Committed to trunk/branch-2/branch-2.8, thanks Rohith Sharma K S for the patch and thanks Sunil G for reviews!

            People

            • Assignee:
              rohithsharma Rohith Sharma K S
              Reporter:
              rohithsharma Rohith Sharma K S
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development