[YARN-292] ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt - ASF JIRA

Nemon Lou added a comment - 07/Apr/13 09:13

Seems that applications map in FIFO Scheduler is not thread safe.

I also met this issue during running 20,000 jobs (with 20 clients submitting jobs concurrently).

Nemon Lou added a comment - 07/Apr/13 09:13 Seems that applications map in FIFO Scheduler is not thread safe. I also met this issue during running 20,000 jobs (with 20 clients submitting jobs concurrently).

Junping Du added a comment - 09/Jul/13 11:26

I think it is just caused by following code:

      // Acquire the AM container from the scheduler.
      Allocation amContainerAllocation = appAttempt.scheduler.allocate(
          appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST,
          EMPTY_CONTAINER_RELEASE_LIST, null, null);

      // Set the masterContainer
      appAttempt.setMasterContainer(amContainerAllocation.getContainers().get(
                                                                           0));
      ...

It is possible for amContainerAllocation didn't get any containers if cluster is quite busy. In this case, access get(0) will cause out of bounds exception. Will deliver a quick patch and UT to fix this later.

Junping Du added a comment - 09/Jul/13 11:26 I think it is just caused by following code: // Acquire the AM container from the scheduler. Allocation amContainerAllocation = appAttempt.scheduler.allocate( appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null, null); // Set the masterContainer appAttempt.setMasterContainer(amContainerAllocation.getContainers().get( 0)); ... It is possible for amContainerAllocation didn't get any containers if cluster is quite busy. In this case, access get(0) will cause out of bounds exception. Will deliver a quick patch and UT to fix this later.

Zhijie Shen added a comment - 11/Jul/13 18:56

Will look into this problem

Zhijie Shen added a comment - 11/Jul/13 18:56 Will look into this problem

Zhijie Shen added a comment - 11/Jul/13 23:25 - edited

      // Acquire the AM container from the scheduler.
      Allocation amContainerAllocation = appAttempt.scheduler.allocate(
          appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST,
          EMPTY_CONTAINER_RELEASE_LIST, null, null);

The above code will eventually pull the newly allocated containers in newlyAllocatedContainers.

Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. Therefore, pulling newlyAllocatedContainers happens when RMContainer is at ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when it is still at NEW. In conclusion, one container in the allocation is expected in AMContainerAllocatedTransition.

Hinted by nemon, the problem may happen at

    FiCaSchedulerApp application = getApplication(applicationAttemptId);
    if (application == null) {
      LOG.error("Calling allocate on removed " +
          "or non existant application " + applicationAttemptId);
      return EMPTY_ALLOCATION;
    }

EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be inconsistent synchronization on accessing the application map.

Suddenly be aware that junping_du has started working on this problem. Please feel free to take it over. Thanks!

Zhijie Shen added a comment - 11/Jul/13 23:25 - edited // Acquire the AM container from the scheduler. Allocation amContainerAllocation = appAttempt.scheduler.allocate( appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null , null ); The above code will eventually pull the newly allocated containers in newlyAllocatedContainers. Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. Therefore, pulling newlyAllocatedContainers happens when RMContainer is at ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when it is still at NEW. In conclusion, one container in the allocation is expected in AMContainerAllocatedTransition. Hinted by nemon , the problem may happen at FiCaSchedulerApp application = getApplication(applicationAttemptId); if (application == null ) { LOG.error( "Calling allocate on removed " + "or non existant application " + applicationAttemptId); return EMPTY_ALLOCATION; } EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be inconsistent synchronization on accessing the application map. Suddenly be aware that junping_du has started working on this problem. Please feel free to take it over. Thanks!

Junping Du added a comment - 12/Jul/13 02:23

Hi zjshen, I think your work above reveal the root cause of this bug. So please feel free to go ahead and fix it. I will also help to review it. Thx!

Junping Du added a comment - 12/Jul/13 02:23 Hi zjshen , I think your work above reveal the root cause of this bug. So please feel free to go ahead and fix it. I will also help to review it. Thx!

Zhijie Shen added a comment - 13/Aug/13 23:24

Did more investigation on this issue:

2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001

This log indicates that ArrayIndexOutOfBoundsException happens because the application is not found. There're three possibilities where the application is not found:

1. The application hasn't been added into FiFoScheduler#applications. If it is the case, FiFoScheduler will not send APP_ACCEPTED event to the corresponding RMAppAttemptImpl. Without APP_ACCEPTED event, RMAppAttemptImpl will not enter SCHEDULED state, and will not go through AMContainerAllocatedTransition to ALLOCATED_SAVING consequently. Therefore, this case is impossible.

2. The application has already been removed from FiFoScheduler#applications. To trigger the removal operation, the corresponding RMAppAttemptImpl needs to go through BaseFinalTransition.

It is worth mentioning first that RMAppAttemptImpl's transitions are executed on the thread of AsyncDispatcher, while YarnScheduler#handle is invoked on the thread of SchedulerEventDispatcher. The two threads will execute in parallel, indicating that the process of an RMAppAttemptEvent and that of a SchedulerEvent may interpolate. However, the processes of two RMAppAttemptEvents or two SchedulerEvents will not.

Therefore, AMContainerAllocatedTransition will not start before RMAppAttemptImpl has already finished BaseFinalTransition. Nevertheless, when RMAppAttemptImpl goes through BaseFinalTransition, it will enter an final state as well, such that AMContainerAllocatedTransition will not happen at all. In conclusion, this case is impossible as well.

3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations.

Please feel free to correct me if you think there's something wrong or missing with the analysis. I'm going to work on a patch to fix the problem.

Zhijie Shen added a comment - 13/Aug/13 23:24 Did more investigation on this issue: 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001 This log indicates that ArrayIndexOutOfBoundsException happens because the application is not found. There're three possibilities where the application is not found: 1. The application hasn't been added into FiFoScheduler#applications. If it is the case, FiFoScheduler will not send APP_ACCEPTED event to the corresponding RMAppAttemptImpl. Without APP_ACCEPTED event, RMAppAttemptImpl will not enter SCHEDULED state, and will not go through AMContainerAllocatedTransition to ALLOCATED_SAVING consequently. Therefore, this case is impossible. 2. The application has already been removed from FiFoScheduler#applications. To trigger the removal operation, the corresponding RMAppAttemptImpl needs to go through BaseFinalTransition. It is worth mentioning first that RMAppAttemptImpl's transitions are executed on the thread of AsyncDispatcher, while YarnScheduler#handle is invoked on the thread of SchedulerEventDispatcher. The two threads will execute in parallel, indicating that the process of an RMAppAttemptEvent and that of a SchedulerEvent may interpolate. However, the processes of two RMAppAttemptEvents or two SchedulerEvents will not. Therefore, AMContainerAllocatedTransition will not start before RMAppAttemptImpl has already finished BaseFinalTransition. Nevertheless, when RMAppAttemptImpl goes through BaseFinalTransition, it will enter an final state as well, such that AMContainerAllocatedTransition will not happen at all. In conclusion, this case is impossible as well. 3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations. Please feel free to correct me if you think there's something wrong or missing with the analysis. I'm going to work on a patch to fix the problem.

Zhijie Shen added a comment - 14/Aug/13 06:25

Created a patch to use ConcurrentHashMap for applications in FifoScheduler and FairScheduler, which will make accessing applications thread-safe.

Zhijie Shen added a comment - 14/Aug/13 06:25 Created a patch to use ConcurrentHashMap for applications in FifoScheduler and FairScheduler, which will make accessing applications thread-safe.

Junping Du added a comment - 14/Aug/13 06:51

Thanks for the patch. Zhijie! Patch looks good to me. However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers().

Junping Du added a comment - 14/Aug/13 06:51 Thanks for the patch. Zhijie! Patch looks good to me. However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers().

Hadoop QA added a comment - 14/Aug/13 06:54

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12597896/YARN-292.1.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1710//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1710//console

This message is automatically generated.

Hadoop QA added a comment - 14/Aug/13 06:54 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597896/YARN-292.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1710//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1710//console This message is automatically generated.

Junping Du added a comment - 14/Aug/13 07:03

Also, I see you only address Fifo and Fair, but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). Shall we apply the same change there?

Junping Du added a comment - 14/Aug/13 07:03 Also, I see you only address Fifo and Fair, but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). Shall we apply the same change there?

Zhijie Shen added a comment - 14/Aug/13 08:02

Thanks for reviewing the patch, Junping!

However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers().

In ScheduleTransition, it is already checked that the number of allocated containers is 0, which means newlyAllocatedContainers is still empty. Therefore, AMContainerAllocatedTransition comes after ScheduleTransition, and is triggered by CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is emitted after an RMContainer is created and put into newlyAllocatedContainers. Therefore, in AMContainerAllocatedTransition, at least 1 container is expected. I'll document it as the comment in AMContainerAllocatedTransition.

but not address CapacityScheduler (applicationsMap should be in class of LeafQueue).

CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it.

Zhijie Shen added a comment - 14/Aug/13 08:02 Thanks for reviewing the patch, Junping! However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers(). In ScheduleTransition, it is already checked that the number of allocated containers is 0, which means newlyAllocatedContainers is still empty. Therefore, AMContainerAllocatedTransition comes after ScheduleTransition, and is triggered by CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is emitted after an RMContainer is created and put into newlyAllocatedContainers. Therefore, in AMContainerAllocatedTransition, at least 1 container is expected. I'll document it as the comment in AMContainerAllocatedTransition. but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it.

Junping Du added a comment - 14/Aug/13 08:50

I'll document it as the comment in AMContainerAllocatedTransition.

Thanks.

CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it.

That's true. thx!

Junping Du added a comment - 14/Aug/13 08:50 I'll document it as the comment in AMContainerAllocatedTransition. Thanks. CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it. That's true. thx!

Zhijie Shen added a comment - 14/Aug/13 17:11

Updated the patch to add comments and assert in AMContainerAllocatedTransition, to justify the number of allocated containers is not zero.

Zhijie Shen added a comment - 14/Aug/13 17:11 Updated the patch to add comments and assert in AMContainerAllocatedTransition, to justify the number of allocated containers is not zero.

Hadoop QA added a comment - 14/Aug/13 17:34

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12597996/YARN-292.2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1713//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1713//console

This message is automatically generated.

Hadoop QA added a comment - 14/Aug/13 17:34 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597996/YARN-292.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1713//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1713//console This message is automatically generated.

Nemon Lou added a comment - 15/Aug/13 11:05

FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right?

Nemon Lou added a comment - 15/Aug/13 11:05 FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right?

Zhijie Shen added a comment - 15/Aug/13 19:28

FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right?

AFAIK, FIFO is not controlled by FifoScheduler.applications, and TreeMap cannot be used for ordering. Instead, FifoPolicy has a FifoComparator, which can be used to sort Schedulable object collection. vinodkv, would you please confirm it?

Zhijie Shen added a comment - 15/Aug/13 19:28 FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right? AFAIK, FIFO is not controlled by FifoScheduler.applications, and TreeMap cannot be used for ordering. Instead, FifoPolicy has a FifoComparator, which can be used to sort Schedulable object collection. vinodkv , would you please confirm it?

Zhijie Shen added a comment - 15/Aug/13 19:42

Sorry, just noticed that

    // Try to assign containers to applications in fifo order
    for (Map.Entry<ApplicationAttemptId, FiCaSchedulerApp> e : applications
        .entrySet()) {

There's iteration over the map collection. Probably we can use ConcurrentSkipListMap, which is thread safe and preserve the order as TreeMap does.

Zhijie Shen added a comment - 15/Aug/13 19:42 Sorry, just noticed that // Try to assign containers to applications in fifo order for (Map.Entry<ApplicationAttemptId, FiCaSchedulerApp> e : applications .entrySet()) { There's iteration over the map collection. Probably we can use ConcurrentSkipListMap, which is thread safe and preserve the order as TreeMap does.

Zhijie Shen added a comment - 16/Aug/13 07:01

Thanks nemon for your hint. I've updated FifoScheduler to use ConcurrentSkipListMap instead

Zhijie Shen added a comment - 16/Aug/13 07:01 Thanks nemon for your hint. I've updated FifoScheduler to use ConcurrentSkipListMap instead

Hadoop QA added a comment - 16/Aug/13 07:30

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12598381/YARN-292.3.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1731//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1731//console

This message is automatically generated.

Hadoop QA added a comment - 16/Aug/13 07:30 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598381/YARN-292.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1731//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1731//console This message is automatically generated.

Nemon Lou added a comment - 16/Aug/13 08:14

Thanks Zhijie Shen for your update.Do you plan to add some test cases for it? I think the test part will be the most difficult one.

Nemon Lou added a comment - 16/Aug/13 08:14 Thanks Zhijie Shen for your update.Do you plan to add some test cases for it? I think the test part will be the most difficult one.

Zhijie Shen added a comment - 26/Aug/13 17:12

nemon, agree. It's difficult to stably reproduce the problem of the thread-unsafe map. Any suggestions?

Zhijie Shen added a comment - 26/Aug/13 17:12 nemon , agree. It's difficult to stably reproduce the problem of the thread-unsafe map. Any suggestions?

Nemon Lou added a comment - 27/Aug/13 01:30

I will try to post my test result after applying this patch when i have time. No idea about the test case part.

Nemon Lou added a comment - 27/Aug/13 01:30 I will try to post my test result after applying this patch when i have time. No idea about the test case part.

Vinod Kumar Vavilapalli added a comment - 02/Sep/13 00:46

3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations.

This doesn't sound right to me. The thing is scheduler will be told to remove app only by RMAppAttempt. Now if the RMAppAttempt is going to AMContainerAllocatedTransition, it cannot tell the scheduler to remove app. While the theory of unsafe data-structures seems right, I still can't see the case when the original exception can happen. Clearly the app was removed, then the RMAppAttempt would have gone into KILLING state, right? If so, why is it now trying to get the AM Container?

Vinod Kumar Vavilapalli added a comment - 02/Sep/13 00:46 3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations. This doesn't sound right to me. The thing is scheduler will be told to remove app only by RMAppAttempt. Now if the RMAppAttempt is going to AMContainerAllocatedTransition, it cannot tell the scheduler to remove app. While the theory of unsafe data-structures seems right, I still can't see the case when the original exception can happen. Clearly the app was removed, then the RMAppAttempt would have gone into KILLING state, right? If so, why is it now trying to get the AM Container?

Vinod Kumar Vavilapalli added a comment - 03/Sep/13 23:04

I will try to post my test result after applying this patch when i have time. No idea about the test case part.

Nemon, we are unable to come up with a scenario when this happens. The next time you run into this, can you please capture the RM logs and upload them here? Tx.

Vinod Kumar Vavilapalli added a comment - 03/Sep/13 23:04 I will try to post my test result after applying this patch when i have time. No idea about the test case part. Nemon, we are unable to come up with a scenario when this happens. The next time you run into this, can you please capture the RM logs and upload them here? Tx.

Nemon Lou added a comment - 04/Sep/13 01:53

Finally find the log. Please check the tail part.

Nemon Lou added a comment - 04/Sep/13 01:53 Finally find the log. Please check the tail part.

Vinod Kumar Vavilapalli added a comment - 05/Sep/13 03:30

Thanks for the logs Nemon.

Looked at the logs. We were so focused in removals that we forgot the puts. And as the logs clearly pointed out, another app was getting added at (almost) the same point of time as get, and since this is a TreeMap (or even HashMap), there are structural changes even with a put

The ptach isn't applying anymore, can you please update?

Also can you try to write a simple test, with one threads putting lots of apps and the other trying to allocate the AM? Not a very useful test, but can give us a little confidence.

Vinod Kumar Vavilapalli added a comment - 05/Sep/13 03:30 Thanks for the logs Nemon. Looked at the logs. We were so focused in removals that we forgot the puts. And as the logs clearly pointed out, another app was getting added at (almost) the same point of time as get, and since this is a TreeMap (or even HashMap), there are structural changes even with a put The ptach isn't applying anymore, can you please update? Also can you try to write a simple test, with one threads putting lots of apps and the other trying to allocate the AM? Not a very useful test, but can give us a little confidence.

Zhijie Shen added a comment - 05/Sep/13 08:15

A new patch against the latest trunk is uploaded. I added the test cases in it. The test cases imitate that one thread (RMAppAttempt) is getting the app while the other thread (YarnScheduler) is adding and removing apps. Though the test cases cannot guarantee the reproducing of the bug, as Vinod said, it can give us a little confidence. I don't make the test size too large to prevent prolonging the unit test phase.

Zhijie Shen added a comment - 05/Sep/13 08:15 A new patch against the latest trunk is uploaded. I added the test cases in it. The test cases imitate that one thread (RMAppAttempt) is getting the app while the other thread (YarnScheduler) is adding and removing apps. Though the test cases cannot guarantee the reproducing of the bug, as Vinod said, it can give us a little confidence. I don't make the test size too large to prevent prolonging the unit test phase.

Hadoop QA added a comment - 05/Sep/13 08:44

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12601576/YARN-292.4.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1842//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1842//console

This message is automatically generated.

Hadoop QA added a comment - 05/Sep/13 08:44 +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601576/YARN-292.4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1842//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1842//console This message is automatically generated.

Vinod Kumar Vavilapalli added a comment - 10/Sep/13 01:17

Actually, I am able to reproduce failures with TestFifoScheduler consistently.

+1, the patch looks good. Checking this in.

Vinod Kumar Vavilapalli added a comment - 10/Sep/13 01:17 Actually, I am able to reproduce failures with TestFifoScheduler consistently. +1, the patch looks good. Checking this in.

Vinod Kumar Vavilapalli added a comment - 10/Sep/13 01:27

Committed this to trunk, branch-2 and branch-2.1. Thanks Zhijie!

Tx Nemon too for all the help with logs and Junping too!

Vinod Kumar Vavilapalli added a comment - 10/Sep/13 01:27 Committed this to trunk, branch-2 and branch-2.1. Thanks Zhijie! Tx Nemon too for all the help with logs and Junping too!

Hudson added a comment - 10/Sep/13 01:35

SUCCESS: Integrated in Hadoop-trunk-Commit #4392 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4392/)
~~YARN-292~~. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)

/hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

Hudson added a comment - 10/Sep/13 01:35 SUCCESS: Integrated in Hadoop-trunk-Commit #4392 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4392/ ) YARN-292 . Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

Hudson added a comment - 10/Sep/13 10:56

SUCCESS: Integrated in Hadoop-Yarn-trunk #328 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/328/)
~~YARN-292~~. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)

/hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

Hudson added a comment - 10/Sep/13 10:56 SUCCESS: Integrated in Hadoop-Yarn-trunk #328 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/328/ ) YARN-292 . Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

Hudson added a comment - 10/Sep/13 13:30

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1518 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1518/)
~~YARN-292~~. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)

/hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

Hudson added a comment - 10/Sep/13 13:30 SUCCESS: Integrated in Hadoop-Hdfs-trunk #1518 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1518/ ) YARN-292 . Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

Hudson added a comment - 10/Sep/13 14:28

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1544 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1544/)
~~YARN-292~~. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)

/hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

Hudson added a comment - 10/Sep/13 14:28 FAILURE: Integrated in Hadoop-Mapreduce-trunk #1544 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1544/ ) YARN-292 . Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

Hadoop YARN

ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

Details

Description

Attachments

Attachments

Activity

People

Dates