Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-676 [Umbrella] Daemons crashing because of invalid state transitions
  3. YARN-292

ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.1-alpha
    • 2.1.1-beta
    • resourcemanager
    • None
    • Reviewed

    Description

      2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001
      2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525
      java.lang.ArrayIndexOutOfBoundsException: 0
      	at java.util.Arrays$ArrayList.get(Arrays.java:3381)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
      	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
      	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
      	at java.lang.Thread.run(Thread.java:662)
       

      Attachments

        1. YARN-292.1.patch
          5 kB
          Zhijie Shen
        2. YARN-292.2.patch
          6 kB
          Zhijie Shen
        3. YARN-292.3.patch
          6 kB
          Zhijie Shen
        4. ArrayIndexOutOfBoundsException.log
          31 kB
          Nemon Lou
        5. YARN-292.4.patch
          19 kB
          Zhijie Shen

        Activity

          nemon Nemon Lou added a comment -

          Seems that applications map in FIFO Scheduler is not thread safe.

          I also met this issue during running 20,000 jobs (with 20 clients submitting jobs concurrently).

          nemon Nemon Lou added a comment - Seems that applications map in FIFO Scheduler is not thread safe. I also met this issue during running 20,000 jobs (with 20 clients submitting jobs concurrently).
          junping_du Junping Du added a comment -

          I think it is just caused by following code:

                // Acquire the AM container from the scheduler.
                Allocation amContainerAllocation = appAttempt.scheduler.allocate(
                    appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST,
                    EMPTY_CONTAINER_RELEASE_LIST, null, null);
          
                // Set the masterContainer
                appAttempt.setMasterContainer(amContainerAllocation.getContainers().get(
                                                                                     0));
                ...
          

          It is possible for amContainerAllocation didn't get any containers if cluster is quite busy. In this case, access get(0) will cause out of bounds exception. Will deliver a quick patch and UT to fix this later.

          junping_du Junping Du added a comment - I think it is just caused by following code: // Acquire the AM container from the scheduler. Allocation amContainerAllocation = appAttempt.scheduler.allocate( appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null, null); // Set the masterContainer appAttempt.setMasterContainer(amContainerAllocation.getContainers().get( 0)); ... It is possible for amContainerAllocation didn't get any containers if cluster is quite busy. In this case, access get(0) will cause out of bounds exception. Will deliver a quick patch and UT to fix this later.
          zjshen Zhijie Shen added a comment -

          Will look into this problem

          zjshen Zhijie Shen added a comment - Will look into this problem
          zjshen Zhijie Shen added a comment - - edited
                // Acquire the AM container from the scheduler.
                Allocation amContainerAllocation = appAttempt.scheduler.allocate(
                    appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST,
                    EMPTY_CONTAINER_RELEASE_LIST, null, null);
          

          The above code will eventually pull the newly allocated containers in newlyAllocatedContainers.

          Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. Therefore, pulling newlyAllocatedContainers happens when RMContainer is at ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when it is still at NEW. In conclusion, one container in the allocation is expected in AMContainerAllocatedTransition.

          Hinted by nemon, the problem may happen at

              FiCaSchedulerApp application = getApplication(applicationAttemptId);
              if (application == null) {
                LOG.error("Calling allocate on removed " +
                    "or non existant application " + applicationAttemptId);
                return EMPTY_ALLOCATION;
              }
          

          EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be inconsistent synchronization on accessing the application map.

          Suddenly be aware that junping_du has started working on this problem. Please feel free to take it over. Thanks!

          zjshen Zhijie Shen added a comment - - edited // Acquire the AM container from the scheduler. Allocation amContainerAllocation = appAttempt.scheduler.allocate( appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null , null ); The above code will eventually pull the newly allocated containers in newlyAllocatedContainers. Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. Therefore, pulling newlyAllocatedContainers happens when RMContainer is at ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when it is still at NEW. In conclusion, one container in the allocation is expected in AMContainerAllocatedTransition. Hinted by nemon , the problem may happen at FiCaSchedulerApp application = getApplication(applicationAttemptId); if (application == null ) { LOG.error( "Calling allocate on removed " + "or non existant application " + applicationAttemptId); return EMPTY_ALLOCATION; } EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be inconsistent synchronization on accessing the application map. Suddenly be aware that junping_du has started working on this problem. Please feel free to take it over. Thanks!
          junping_du Junping Du added a comment -

          Hi zjshen, I think your work above reveal the root cause of this bug. So please feel free to go ahead and fix it. I will also help to review it. Thx!

          junping_du Junping Du added a comment - Hi zjshen , I think your work above reveal the root cause of this bug. So please feel free to go ahead and fix it. I will also help to review it. Thx!
          zjshen Zhijie Shen added a comment -

          Did more investigation on this issue:

          2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001
          

          This log indicates that ArrayIndexOutOfBoundsException happens because the application is not found. There're three possibilities where the application is not found:

          1. The application hasn't been added into FiFoScheduler#applications. If it is the case, FiFoScheduler will not send APP_ACCEPTED event to the corresponding RMAppAttemptImpl. Without APP_ACCEPTED event, RMAppAttemptImpl will not enter SCHEDULED state, and will not go through AMContainerAllocatedTransition to ALLOCATED_SAVING consequently. Therefore, this case is impossible.

          2. The application has already been removed from FiFoScheduler#applications. To trigger the removal operation, the corresponding RMAppAttemptImpl needs to go through BaseFinalTransition.

          It is worth mentioning first that RMAppAttemptImpl's transitions are executed on the thread of AsyncDispatcher, while YarnScheduler#handle is invoked on the thread of SchedulerEventDispatcher. The two threads will execute in parallel, indicating that the process of an RMAppAttemptEvent and that of a SchedulerEvent may interpolate. However, the processes of two RMAppAttemptEvents or two SchedulerEvents will not.

          Therefore, AMContainerAllocatedTransition will not start before RMAppAttemptImpl has already finished BaseFinalTransition. Nevertheless, when RMAppAttemptImpl goes through BaseFinalTransition, it will enter an final state as well, such that AMContainerAllocatedTransition will not happen at all. In conclusion, this case is impossible as well.

          3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations.

          Please feel free to correct me if you think there's something wrong or missing with the analysis. I'm going to work on a patch to fix the problem.

          zjshen Zhijie Shen added a comment - Did more investigation on this issue: 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001 This log indicates that ArrayIndexOutOfBoundsException happens because the application is not found. There're three possibilities where the application is not found: 1. The application hasn't been added into FiFoScheduler#applications. If it is the case, FiFoScheduler will not send APP_ACCEPTED event to the corresponding RMAppAttemptImpl. Without APP_ACCEPTED event, RMAppAttemptImpl will not enter SCHEDULED state, and will not go through AMContainerAllocatedTransition to ALLOCATED_SAVING consequently. Therefore, this case is impossible. 2. The application has already been removed from FiFoScheduler#applications. To trigger the removal operation, the corresponding RMAppAttemptImpl needs to go through BaseFinalTransition. It is worth mentioning first that RMAppAttemptImpl's transitions are executed on the thread of AsyncDispatcher, while YarnScheduler#handle is invoked on the thread of SchedulerEventDispatcher. The two threads will execute in parallel, indicating that the process of an RMAppAttemptEvent and that of a SchedulerEvent may interpolate. However, the processes of two RMAppAttemptEvents or two SchedulerEvents will not. Therefore, AMContainerAllocatedTransition will not start before RMAppAttemptImpl has already finished BaseFinalTransition. Nevertheless, when RMAppAttemptImpl goes through BaseFinalTransition, it will enter an final state as well, such that AMContainerAllocatedTransition will not happen at all. In conclusion, this case is impossible as well. 3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations. Please feel free to correct me if you think there's something wrong or missing with the analysis. I'm going to work on a patch to fix the problem.
          zjshen Zhijie Shen added a comment -

          Created a patch to use ConcurrentHashMap for applications in FifoScheduler and FairScheduler, which will make accessing applications thread-safe.

          zjshen Zhijie Shen added a comment - Created a patch to use ConcurrentHashMap for applications in FifoScheduler and FairScheduler, which will make accessing applications thread-safe.
          junping_du Junping Du added a comment -

          Thanks for the patch. Zhijie! Patch looks good to me. However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers().

          junping_du Junping Du added a comment - Thanks for the patch. Zhijie! Patch looks good to me. However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers().
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12597896/YARN-292.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1710//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1710//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597896/YARN-292.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1710//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1710//console This message is automatically generated.
          junping_du Junping Du added a comment -

          Also, I see you only address Fifo and Fair, but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). Shall we apply the same change there?

          junping_du Junping Du added a comment - Also, I see you only address Fifo and Fair, but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). Shall we apply the same change there?
          zjshen Zhijie Shen added a comment -

          Thanks for reviewing the patch, Junping!

          However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers().

          In ScheduleTransition, it is already checked that the number of allocated containers is 0, which means newlyAllocatedContainers is still empty. Therefore, AMContainerAllocatedTransition comes after ScheduleTransition, and is triggered by CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is emitted after an RMContainer is created and put into newlyAllocatedContainers. Therefore, in AMContainerAllocatedTransition, at least 1 container is expected. I'll document it as the comment in AMContainerAllocatedTransition.

          but not address CapacityScheduler (applicationsMap should be in class of LeafQueue).

          CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it.

          zjshen Zhijie Shen added a comment - Thanks for reviewing the patch, Junping! However, I would suggest to document why at least one container is expected in allocation or adding no empty check on getContainers(). In ScheduleTransition, it is already checked that the number of allocated containers is 0, which means newlyAllocatedContainers is still empty. Therefore, AMContainerAllocatedTransition comes after ScheduleTransition, and is triggered by CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is emitted after an RMContainer is created and put into newlyAllocatedContainers. Therefore, in AMContainerAllocatedTransition, at least 1 container is expected. I'll document it as the comment in AMContainerAllocatedTransition. but not address CapacityScheduler (applicationsMap should be in class of LeafQueue). CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it.
          junping_du Junping Du added a comment -

          I'll document it as the comment in AMContainerAllocatedTransition.

          Thanks.

          CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it.

          That's true. thx!

          junping_du Junping Du added a comment - I'll document it as the comment in AMContainerAllocatedTransition. Thanks. CapacityScheduler.applications is already ConcurrentHashMap, and all the methods to access LeafQueue.applicationsMap is synchronized. Therefore, I think we don't need to change it. That's true. thx!
          zjshen Zhijie Shen added a comment -

          Updated the patch to add comments and assert in AMContainerAllocatedTransition, to justify the number of allocated containers is not zero.

          zjshen Zhijie Shen added a comment - Updated the patch to add comments and assert in AMContainerAllocatedTransition, to justify the number of allocated containers is not zero.
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12597996/YARN-292.2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1713//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1713//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597996/YARN-292.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1713//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1713//console This message is automatically generated.
          nemon Nemon Lou added a comment -

          FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right?

          nemon Nemon Lou added a comment - FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right?
          zjshen Zhijie Shen added a comment -

          FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right?

          AFAIK, FIFO is not controlled by FifoScheduler.applications, and TreeMap cannot be used for ordering. Instead, FifoPolicy has a FifoComparator, which can be used to sort Schedulable object collection. vinodkv, would you please confirm it?

          zjshen Zhijie Shen added a comment - FIFO Scheduler uses TreeMap to keep applications in FIFO order,ConcurrentHashMap will break this featrue. Right? AFAIK, FIFO is not controlled by FifoScheduler.applications, and TreeMap cannot be used for ordering. Instead, FifoPolicy has a FifoComparator, which can be used to sort Schedulable object collection. vinodkv , would you please confirm it?
          zjshen Zhijie Shen added a comment -

          Sorry, just noticed that

              // Try to assign containers to applications in fifo order
              for (Map.Entry<ApplicationAttemptId, FiCaSchedulerApp> e : applications
                  .entrySet()) {
          

          There's iteration over the map collection. Probably we can use ConcurrentSkipListMap, which is thread safe and preserve the order as TreeMap does.

          zjshen Zhijie Shen added a comment - Sorry, just noticed that // Try to assign containers to applications in fifo order for (Map.Entry<ApplicationAttemptId, FiCaSchedulerApp> e : applications .entrySet()) { There's iteration over the map collection. Probably we can use ConcurrentSkipListMap, which is thread safe and preserve the order as TreeMap does.
          zjshen Zhijie Shen added a comment -

          Thanks nemon for your hint. I've updated FifoScheduler to use ConcurrentSkipListMap instead

          zjshen Zhijie Shen added a comment - Thanks nemon for your hint. I've updated FifoScheduler to use ConcurrentSkipListMap instead
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12598381/YARN-292.3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1731//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1731//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598381/YARN-292.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1731//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1731//console This message is automatically generated.
          nemon Nemon Lou added a comment -

          Thanks Zhijie Shen for your update.Do you plan to add some test cases for it? I think the test part will be the most difficult one.

          nemon Nemon Lou added a comment - Thanks Zhijie Shen for your update.Do you plan to add some test cases for it? I think the test part will be the most difficult one.
          zjshen Zhijie Shen added a comment -

          nemon, agree. It's difficult to stably reproduce the problem of the thread-unsafe map. Any suggestions?

          zjshen Zhijie Shen added a comment - nemon , agree. It's difficult to stably reproduce the problem of the thread-unsafe map. Any suggestions?
          nemon Nemon Lou added a comment -

          I will try to post my test result after applying this patch when i have time. No idea about the test case part.

          nemon Nemon Lou added a comment - I will try to post my test result after applying this patch when i have time. No idea about the test case part.

          3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations.

          This doesn't sound right to me. The thing is scheduler will be told to remove app only by RMAppAttempt. Now if the RMAppAttempt is going to AMContainerAllocatedTransition, it cannot tell the scheduler to remove app. While the theory of unsafe data-structures seems right, I still can't see the case when the original exception can happen. Clearly the app was removed, then the RMAppAttempt would have gone into KILLING state, right? If so, why is it now trying to get the AM Container?

          vinodkv Vinod Kumar Vavilapalli added a comment - 3. The application is in FiFoScheduler#applications, but RMAppAttemptImpl doesn't get it. First of all, FiFoScheduler#applications is a TreeMap, which is not thread safe (FairScheduler#applications is a HashMap while CapcityScheduler#applications is a ConcurrentHashMap). Second, the methods of accessing the map are not consistently synchronized, thus, read and write on the same map can operate simultaneously. RMAppAttemptImpl on the thread of AsyncDispatcher will eventually call FiFoScheduler#applications#get in AMContainerAllocatedTransition, while FiFoScheduler on thread of SchedulerEventDispatcher will use FiFoScheduler#applications#add|remove. Therefore, getting null when the application actually exists happens under a big number of concurrent operations. This doesn't sound right to me. The thing is scheduler will be told to remove app only by RMAppAttempt. Now if the RMAppAttempt is going to AMContainerAllocatedTransition, it cannot tell the scheduler to remove app. While the theory of unsafe data-structures seems right, I still can't see the case when the original exception can happen. Clearly the app was removed, then the RMAppAttempt would have gone into KILLING state, right? If so, why is it now trying to get the AM Container?

          I will try to post my test result after applying this patch when i have time. No idea about the test case part.

          Nemon, we are unable to come up with a scenario when this happens. The next time you run into this, can you please capture the RM logs and upload them here? Tx.

          vinodkv Vinod Kumar Vavilapalli added a comment - I will try to post my test result after applying this patch when i have time. No idea about the test case part. Nemon, we are unable to come up with a scenario when this happens. The next time you run into this, can you please capture the RM logs and upload them here? Tx.
          nemon Nemon Lou added a comment -

          Finally find the log. Please check the tail part.

          nemon Nemon Lou added a comment - Finally find the log. Please check the tail part.

          Thanks for the logs Nemon.

          Looked at the logs. We were so focused in removals that we forgot the puts. And as the logs clearly pointed out, another app was getting added at (almost) the same point of time as get, and since this is a TreeMap (or even HashMap), there are structural changes even with a put

          The ptach isn't applying anymore, can you please update?

          Also can you try to write a simple test, with one threads putting lots of apps and the other trying to allocate the AM? Not a very useful test, but can give us a little confidence.

          vinodkv Vinod Kumar Vavilapalli added a comment - Thanks for the logs Nemon. Looked at the logs. We were so focused in removals that we forgot the puts. And as the logs clearly pointed out, another app was getting added at (almost) the same point of time as get, and since this is a TreeMap (or even HashMap), there are structural changes even with a put The ptach isn't applying anymore, can you please update? Also can you try to write a simple test, with one threads putting lots of apps and the other trying to allocate the AM? Not a very useful test, but can give us a little confidence.
          zjshen Zhijie Shen added a comment -

          A new patch against the latest trunk is uploaded. I added the test cases in it. The test cases imitate that one thread (RMAppAttempt) is getting the app while the other thread (YarnScheduler) is adding and removing apps. Though the test cases cannot guarantee the reproducing of the bug, as Vinod said, it can give us a little confidence. I don't make the test size too large to prevent prolonging the unit test phase.

          zjshen Zhijie Shen added a comment - A new patch against the latest trunk is uploaded. I added the test cases in it. The test cases imitate that one thread (RMAppAttempt) is getting the app while the other thread (YarnScheduler) is adding and removing apps. Though the test cases cannot guarantee the reproducing of the bug, as Vinod said, it can give us a little confidence. I don't make the test size too large to prevent prolonging the unit test phase.
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12601576/YARN-292.4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1842//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1842//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12601576/YARN-292.4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1842//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1842//console This message is automatically generated.

          Actually, I am able to reproduce failures with TestFifoScheduler consistently.

          +1, the patch looks good. Checking this in.

          vinodkv Vinod Kumar Vavilapalli added a comment - Actually, I am able to reproduce failures with TestFifoScheduler consistently. +1, the patch looks good. Checking this in.

          Committed this to trunk, branch-2 and branch-2.1. Thanks Zhijie!

          Tx Nemon too for all the help with logs and Junping too!

          vinodkv Vinod Kumar Vavilapalli added a comment - Committed this to trunk, branch-2 and branch-2.1. Thanks Zhijie! Tx Nemon too for all the help with logs and Junping too!
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #4392 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4392/)
          YARN-292. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #4392 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4392/ ) YARN-292 . Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #328 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/328/)
          YARN-292. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #328 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/328/ ) YARN-292 . Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1518 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1518/)
          YARN-292. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1518 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1518/ ) YARN-292 . Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1544 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1544/)
          YARN-292. Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328)

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1544 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1544/ ) YARN-292 . Fixed FifoScheduler and FairScheduler to make their applications data structures thread safe to avoid RM crashing with ArrayIndexOutOfBoundsException. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1521328 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java

          People

            zjshen Zhijie Shen
            devaraj Devaraj Kavali
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: