Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.1-alpha
-
None
-
Reviewed
Description
2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525 java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.Arrays$ArrayList.get(Arrays.java:3381) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662)
Attachments
Attachments
- YARN-292.1.patch
- 5 kB
- Zhijie Shen
- YARN-292.2.patch
- 6 kB
- Zhijie Shen
- YARN-292.3.patch
- 6 kB
- Zhijie Shen
- ArrayIndexOutOfBoundsException.log
- 31 kB
- Nemon Lou
- YARN-292.4.patch
- 19 kB
- Zhijie Shen
Activity
I think it is just caused by following code:
// Acquire the AM container from the scheduler. Allocation amContainerAllocation = appAttempt.scheduler.allocate( appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null, null); // Set the masterContainer appAttempt.setMasterContainer(amContainerAllocation.getContainers().get( 0)); ...
It is possible for amContainerAllocation didn't get any containers if cluster is quite busy. In this case, access get(0) will cause out of bounds exception. Will deliver a quick patch and UT to fix this later.
// Acquire the AM container from the scheduler. Allocation amContainerAllocation = appAttempt.scheduler.allocate( appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null, null);
The above code will eventually pull the newly allocated containers in newlyAllocatedContainers.
Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. Therefore, pulling newlyAllocatedContainers happens when RMContainer is at ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when it is still at NEW. In conclusion, one container in the allocation is expected in AMContainerAllocatedTransition.
Hinted by nemon, the problem may happen at
FiCaSchedulerApp application = getApplication(applicationAttemptId); if (application == null) { LOG.error("Calling allocate on removed " + "or non existant application " + applicationAttemptId); return EMPTY_ALLOCATION; }
EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be inconsistent synchronization on accessing the application map.
Suddenly be aware that junping_du has started working on this problem. Please feel free to take it over. Thanks!
Hi zjshen, I think your work above reveal the root cause of this bug. So please feel free to go ahead and fix it. I will also help to review it. Thx!
Did more investigation on this issue:
2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001
This log indicates that ArrayIndexOutOfBoundsException happens because the application is not found. There're three possibilities where the application is not found:
1. The application hasn't been added into FiFoScheduler#applications. If it is the case, FiFoScheduler will not send APP_ACCEPTED event to the corresponding RMAppAttemptImpl. Without APP_ACCEPTED event, RMAppAttemptImpl will not enter SCHEDULED state, and will not go through AMContainerAllocatedTransition to ALLOCATED_SAVING consequently. Therefore, this case is impossible.
2. The application has already been removed from FiFoScheduler#applications. To trigger the removal operation, the corresponding RMAppAttemptImpl needs to go through BaseFinalTransition.
It is worth mentioning first that RMAppAttemptImpl's transitions are executed on the thread of AsyncDispatcher, while YarnScheduler#handle is invoked on the thread of SchedulerEventDispatcher. The two threads will execute in parallel, indicating that the process of an RMAppAttemptEvent and that of a SchedulerEvent may interpolate. However, the processes of two RMAppAttemptEvents or two SchedulerEvents will not.
Therefore, AMContainerAllocatedTransition will not start before RMAppAttemptImpl has already finished BaseFinalTransition. Nevertheless, when RMAppAttemptImpl goes through BaseFinalTransition, it will enter an final state as well, such that AMContainerAllocatedTransition will not happen at all. In conclusion, this case is impossible as well.
3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations.
Please feel free to correct me if you think there's something wrong or missing with the analysis. I'm going to work on a patch to fix the problem.
Created a patch to use ConcurrentHashMap for applications in FifoScheduler and FairScheduler, which will make accessing applications thread-safe.
Thanks for the patch. Zhijie! Patch looks good to me. However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers().
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12597896/YARN-292.1.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1710//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1710//console
This message is automatically generated.
Also, I see you only address Fifo and Fair, but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). Shall we apply the same change there?
Thanks for reviewing the patch, Junping!
However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers().
In ScheduleTransition, it is already checked that the number of allocated containers is 0, which means newlyAllocatedContainers is still empty. Therefore, AMContainerAllocatedTransition comes after ScheduleTransition, and is triggered by CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is emitted after an RMContainer is created and put into newlyAllocatedContainers. Therefore, in AMContainerAllocatedTransition, at least 1 container is expected. I'll document it as the comment in AMContainerAllocatedTransition.
but not address CapacityScheduler (applicationsMap should be in class of LeafQueue).
CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it.
I'll document it as the comment in AMContainerAllocatedTransition.
Thanks.
CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it.
That's true. thx!
Updated the patch to add comments and assert in AMContainerAllocatedTransition, to justify the number of allocated containers is not zero.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12597996/YARN-292.2.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1713//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1713//console
This message is automatically generated.
FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right?
FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right?
AFAIK, FIFO is not controlled by FifoScheduler.applications, and TreeMap cannot be used for ordering. Instead, FifoPolicy has a FifoComparator, which can be used to sort Schedulable object collection. vinodkv, would you please confirm it?
Sorry, just noticed that
// Try to assign containers to applications in fifo order for (Map.Entry<ApplicationAttemptId, FiCaSchedulerApp> e : applications .entrySet()) {
There's iteration over the map collection. Probably we can use ConcurrentSkipListMap, which is thread safe and preserve the order as TreeMap does.
Thanks nemon for your hint. I've updated FifoScheduler to use ConcurrentSkipListMap instead
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12598381/YARN-292.3.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1731//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1731//console
This message is automatically generated.
Thanks Zhijie Shen for your update.Do you plan to add some test cases for it? I think the test part will be the most difficult one.
nemon, agree. It's difficult to stably reproduce the problem of the thread-unsafe map. Any suggestions?
I will try to post my test result after applying this patch when i have time. No idea about the test case part.
3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations.
This doesn't sound right to me. The thing is scheduler will be told to remove app only by RMAppAttempt. Now if the RMAppAttempt is going to AMContainerAllocatedTransition, it cannot tell the scheduler to remove app. While the theory of unsafe data-structures seems right, I still can't see the case when the original exception can happen. Clearly the app was removed, then the RMAppAttempt would have gone into KILLING state, right? If so, why is it now trying to get the AM Container?
I will try to post my test result after applying this patch when i have time. No idea about the test case part.
Nemon, we are unable to come up with a scenario when this happens. The next time you run into this, can you please capture the RM logs and upload them here? Tx.
Thanks for the logs Nemon.
Looked at the logs. We were so focused in removals that we forgot the puts. And as the logs clearly pointed out, another app was getting added at (almost) the same point of time as get, and since this is a TreeMap (or even HashMap), there are structural changes even with a put
The ptach isn't applying anymore, can you please update?
Also can you try to write a simple test, with one threads putting lots of apps and the other trying to allocate the AM? Not a very useful test, but can give us a little confidence.
A new patch against the latest trunk is uploaded. I added the test cases in it. The test cases imitate that one thread (RMAppAttempt) is getting the app while the other thread (YarnScheduler) is adding and removing apps. Though the test cases cannot guarantee the reproducing of the bug, as Vinod said, it can give us a little confidence. I don't make the test size too large to prevent prolonging the unit test phase.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12601576/YARN-292.4.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1842//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1842//console
This message is automatically generated.
Actually, I am able to reproduce failures with TestFifoScheduler consistently.
+1, the patch looks good. Checking this in.
Committed this to trunk, branch-2 and branch-2.1. Thanks Zhijie!
Tx Nemon too for all the help with logs and Junping too!
SUCCESS: Integrated in Hadoop-trunk-Commit #4392 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4392/)
YARN-292. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
SUCCESS: Integrated in Hadoop-Yarn-trunk #328 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/328/)
YARN-292. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
SUCCESS: Integrated in Hadoop-Hdfs-trunk #1518 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1518/)
YARN-292. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
FAILURE: Integrated in Hadoop-Mapreduce-trunk #1544 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1544/)
YARN-292. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
Seems that applications map in FIFO Scheduler is not thread safe.
I also met this issue during running 20,000 jobs (with 20 clients submitting jobs concurrently).