Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2910

FSLeafQueue can throw ConcurrentModificationException

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The list that maintains the runnable and the non runnable apps are a standard ArrayList but there is no guarantee that it will only be manipulated by one thread in the system. This can lead to the following exception:

      2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM.
      java.util.ConcurrentModificationException: java.util.ConcurrentModificationException
      at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
      at java.util.ArrayList$Itr.next(ArrayList.java:831)
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
      at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
      

      Full stack trace in the attached file.

      We should guard against that by using a thread safe version from java.util.concurrent.CopyOnWriteArrayList

      1. FSLeafQueue_concurrent_exception.txt
        5 kB
        Wilfred Spiegelenburg
      2. YARN-2910.004.patch
        10 kB
        Ray Chiang
      3. YARN-2910.1.patch
        9 kB
        Wilfred Spiegelenburg
      4. YARN-2910.2.patch
        9 kB
        Karthik Kambatla
      5. YARN-2910.3.patch
        10 kB
        Akira Ajisaka
      6. YARN-2910.4.patch
        12 kB
        Wilfred Spiegelenburg
      7. YARN-2910.5.patch
        11 kB
        Wilfred Spiegelenburg
      8. YARN-2910.6.patch
        14 kB
        Wilfred Spiegelenburg
      9. YARN-2910.7.patch
        14 kB
        Wilfred Spiegelenburg
      10. YARN-2910.8.patch
        14 kB
        Wilfred Spiegelenburg
      11. YARN-2910.patch
        1 kB
        Wilfred Spiegelenburg

        Issue Links

          Activity

          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          Full exception stack trace and patch

          Show
          wilfreds Wilfred Spiegelenburg added a comment - Full exception stack trace and patch
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12683998/FSLeafQueue_concurrent_exception.txt
          against trunk revision c1f2bb2.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5949//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683998/FSLeafQueue_concurrent_exception.txt against trunk revision c1f2bb2. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5949//console This message is automatically generated.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Wilfred Spiegelenburg I was not known that you wanted to contribute. Since Jira was Unassigned, I assigned to myself.Feel free to assign yourself

          Show
          rohithsharma Rohith Sharma K S added a comment - Wilfred Spiegelenburg I was not known that you wanted to contribute. Since Jira was Unassigned, I assigned to myself.Feel free to assign yourself
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          Rohith Sharma K S Not a problem, I tried assigning it to myself but I do not seem to have the right to do so. I don't know if you can assign it to me or that I need to get some extra access.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - Rohith Sharma K S Not a problem, I tried assigning it to myself but I do not seem to have the right to do so. I don't know if you can assign it to me or that I need to get some extra access.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Unassigning myself since reporter has interest in taking over.

          Show
          rohithsharma Rohith Sharma K S added a comment - Unassigning myself since reporter has interest in taking over.
          Hide
          kasha Karthik Kambatla added a comment -

          Thanks for reporting and working on this, Wilfred. I looked into the accessors of the lists in question and the number of reads and writes seem similar. Collections.synchronizedList might be a better than CopyOnWriteArrayList in this case. Also, will it be possible to add a unit test to avoid regressions in the future?

          Show
          kasha Karthik Kambatla added a comment - Thanks for reporting and working on this, Wilfred. I looked into the accessors of the lists in question and the number of reads and writes seem similar. Collections.synchronizedList might be a better than CopyOnWriteArrayList in this case. Also, will it be possible to add a unit test to avoid regressions in the future?
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          I have the code change done with all the synchronisation around the for loops. All iterator access of the Collections.synchronizedList needs to be synchronised, based on the javadoc, which might impact the performance as much or worse than the copy on write.
          The junit test is in almost done and I will update the patch when that is finished.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - I have the code change done with all the synchronisation around the for loops. All iterator access of the Collections.synchronizedList needs to be synchronised, based on the javadoc, which might impact the performance as much or worse than the copy on write. The junit test is in almost done and I will update the patch when that is finished.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          As Wilfred Spiegelenburg mentioned, synchronization can cause RM slow down if there are many getQueueInfo requests from clients at a time, so I'm thinking CopyOnWriteArrayList might be better.
          By the way, we should fix CapacityScheduler also. (YARN-2922)

          Show
          ajisakaa Akira Ajisaka added a comment - As Wilfred Spiegelenburg mentioned, synchronization can cause RM slow down if there are many getQueueInfo requests from clients at a time, so I'm thinking CopyOnWriteArrayList might be better. By the way, we should fix CapacityScheduler also. ( YARN-2922 )
          Hide
          sandyr Sandy Ryza added a comment -

          Using a CopyOnWriteArrayList would make adding an application an O operation. On many clusters, this happens quite frequently. Acquiring a lock is cheap when there is no contention. if app submissions are frequent, I'd rather slow down requests for queue info than the submissions themselves. Otherwise, the former shouldn't have a large effect on the performance of the latter.

          Show
          sandyr Sandy Ryza added a comment - Using a CopyOnWriteArrayList would make adding an application an O operation. On many clusters, this happens quite frequently. Acquiring a lock is cheap when there is no contention. if app submissions are frequent, I'd rather slow down requests for queue info than the submissions themselves. Otherwise, the former shouldn't have a large effect on the performance of the latter.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          if app submissions are frequent, I'd rather slow down requests for queue info than the submissions themselves.

          Make sense to me. +1 (non-binding) for using SynchronizedList. Thanks.

          Show
          ajisakaa Akira Ajisaka added a comment - if app submissions are frequent, I'd rather slow down requests for queue info than the submissions themselves. Make sense to me. +1 (non-binding) for using SynchronizedList . Thanks.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          Updated patch with the changes as discussed and a junit test

          Show
          wilfreds Wilfred Spiegelenburg added a comment - Updated patch with the changes as discussed and a junit test
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685332/YARN-2910.1.patch
          against trunk revision 0653918.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.TestRM
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

          The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6004//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6004//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685332/YARN-2910.1.patch against trunk revision 0653918. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6004//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6004//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          I get the same error for TestFSLeafQueue in my tree:

          Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue
          Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.185 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue
          testConcurrentAccess(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue) Time elapsed: 0.116 sec <<< ERROR!
          java.lang.NullPointerException: null
          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.setMinShare(FSQueueMetrics.java:75)
          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.<init>(FSQueue.java:64)
          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.<init>(FSLeafQueue.java:67)
          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue.testConcurrentAccess(TestFSLeafQueue.java:243)

          Show
          rchiang Ray Chiang added a comment - I get the same error for TestFSLeafQueue in my tree: Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.185 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue testConcurrentAccess(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue) Time elapsed: 0.116 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.setMinShare(FSQueueMetrics.java:75) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.<init>(FSQueue.java:64) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.<init>(FSLeafQueue.java:67) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue.testConcurrentAccess(TestFSLeafQueue.java:243)
          Hide
          kasha Karthik Kambatla added a comment - - edited

          I had to modify the test slightly.

          Show
          kasha Karthik Kambatla added a comment - - edited I had to modify the test slightly.
          Hide
          kasha Karthik Kambatla added a comment -

          Other test failures either seemed unrelated or temporary. I am obviously +1, pending a clean Jenkins run.

          Show
          kasha Karthik Kambatla added a comment - Other test failures either seemed unrelated or temporary. I am obviously +1, pending a clean Jenkins run.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          The fix itself looks good to me. One concern is that the test passes without the fix on my local.

          Show
          ozawa Tsuyoshi Ozawa added a comment - The fix itself looks good to me. One concern is that the test passes without the fix on my local.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685427/YARN-2910.2.patch
          against trunk revision 4b13082.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestRM
          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

          The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
          org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6008//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6008//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685427/YARN-2910.2.patch against trunk revision 4b13082. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6008//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6008//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          +1 (non-binding) for me as well. Test now passes in my tree.

          Show
          rchiang Ray Chiang added a comment - +1 (non-binding) for me as well. Test now passes in my tree.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Karthik Kambatla Ray Chiang, could you confirm that the additional test introduced in the patch fails without the fix against FSLeafQueue? On my local, I cannot reproduce the problem still with large number of threads/appAttempt entries.

          Show
          ozawa Tsuyoshi Ozawa added a comment - Karthik Kambatla Ray Chiang , could you confirm that the additional test introduced in the patch fails without the fix against FSLeafQueue? On my local, I cannot reproduce the problem still with large number of threads/appAttempt entries.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          the additional test introduced in the patch fails without the fix against FSLeafQueue?

          I don't think the test fails without the fix.

              when(schedulable.getResourceUsage()).thenReturn(smallResource);
          

          The test should run the actual getResourceUsage() method to produce ConcurrentModificationException.

          Show
          ajisakaa Akira Ajisaka added a comment - the additional test introduced in the patch fails without the fix against FSLeafQueue? I don't think the test fails without the fix. when(schedulable.getResourceUsage()).thenReturn(smallResource); The test should run the actual getResourceUsage() method to produce ConcurrentModificationException.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Ah, that's the reason... Ray Chiang, do you mind updating the test?

          Show
          ozawa Tsuyoshi Ozawa added a comment - Ah, that's the reason... Ray Chiang , do you mind updating the test?
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Updated the test to use the actual method, and fixed some indents. I reproduced ConcurrentModificationException without the fix.

          Show
          ajisakaa Akira Ajisaka added a comment - Updated the test to use the actual method, and fixed some indents. I reproduced ConcurrentModificationException without the fix.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          This is a sample patch for improving the test.

          Show
          ajisakaa Akira Ajisaka added a comment - This is a sample patch for improving the test.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Thanks for updating, Akira. But now Ray is the assignee of this JIRA and he is doing this issue, so let's wait for the update by him

          Ray Chiang, could you update following things based on v3 patch?

          +        for (int i=0; i <200; i++) {
          
          +        for (int i=0; i <200; i++) {
          

          On my local, the probability of failure is low with Akira's patch since the number of iteration is too small. How about making the iteration larger(e.g. 10000)?
          Additionally, this is minor nits, but please add space to after < like

          < 10000

          .

          Show
          ozawa Tsuyoshi Ozawa added a comment - Thanks for updating, Akira. But now Ray is the assignee of this JIRA and he is doing this issue, so let's wait for the update by him Ray Chiang , could you update following things based on v3 patch? + for ( int i=0; i <200; i++) { + for ( int i=0; i <200; i++) { On my local, the probability of failure is low with Akira's patch since the number of iteration is too small. How about making the iteration larger(e.g. 10000)? Additionally, this is minor nits, but please add space to after < like < 10000 .
          Hide
          rchiang Ray Chiang added a comment -

          I see Wilfred assigned this to me now.

          I took Akira's changes and updated with Tsuyoshi's suggestion. The new unit test fails 10 out of 10 with the old code and passes 10 out of 10 with the new code.

          Show
          rchiang Ray Chiang added a comment - I see Wilfred assigned this to me now. I took Akira's changes and updated with Tsuyoshi's suggestion. The new unit test fails 10 out of 10 with the old code and passes 10 out of 10 with the new code.
          Hide
          rchiang Ray Chiang added a comment -

          Submit for testing

          Show
          rchiang Ray Chiang added a comment - Submit for testing
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Thanks for your updating, Ray! Looks good to me, pending Jenkins.

          Show
          ozawa Tsuyoshi Ozawa added a comment - Thanks for your updating, Ray! Looks good to me, pending Jenkins.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685657/YARN-2910.3.patch
          against trunk revision 1b3bb9e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestRM
          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

          The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
          org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6027//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6027//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685657/YARN-2910.3.patch against trunk revision 1b3bb9e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6027//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6027//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685666/YARN-2910.004.patch
          against trunk revision 1b3bb9e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.TestRM
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

          The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6028//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6028//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685666/YARN-2910.004.patch against trunk revision 1b3bb9e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6028//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6028//console This message is automatically generated.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          I did not change the assignment

          yes, the when(schedulable.getResourceUsage()).thenReturn(smallResource); should not have been in the patch, my mistake. Not sure how that ended up in the patch I used it during development but not in the last tests.

          On my machine the test failed with just adding applications. The issue seems to be in the initialisation of the application attempt. When I added debug into the test run I can see the initialisation of the app attempt in the mock taking up a lot of time which meant that the getResourceUsage almost always ran over an empty list unless the number of iterations was raised above 1000. As soon as I moved the creation out of the thread the failure occurs within 5 iterations of the getResourceUsage call in the second thread after adding less than 15 or so app instances.

          I have attached an updated patch which passes with the new code and has a 100% failure rate with the old code. This version of the test runs faster and is more reliable than the previous ones.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - I did not change the assignment yes, the when(schedulable.getResourceUsage()).thenReturn(smallResource); should not have been in the patch, my mistake. Not sure how that ended up in the patch I used it during development but not in the last tests. On my machine the test failed with just adding applications. The issue seems to be in the initialisation of the application attempt. When I added debug into the test run I can see the initialisation of the app attempt in the mock taking up a lot of time which meant that the getResourceUsage almost always ran over an empty list unless the number of iterations was raised above 1000. As soon as I moved the creation out of the thread the failure occurs within 5 iterations of the getResourceUsage call in the second thread after adding less than 15 or so app instances. I have attached an updated patch which passes with the new code and has a 100% failure rate with the old code. This version of the test runs faster and is more reliable than the previous ones.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          The fix causes the org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler to fail.
          There is a deadlock that is created by the synchronised read access in the leaf queue for the runnableApps. If an app has two containers at different stages in the allocation it can happen that the appAttempt is locked by one and the runnableApps by the second causing the hang.

          This is what I was afraid of when I mentioned the slow down, I did not anticipate it this bad but the number of reads far outnumber the writes.
          The earlier proposed CopyOnWriteArrayList will also not work due to the sort that is called (and I overlooked) which is not supported.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - The fix causes the org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler to fail. There is a deadlock that is created by the synchronised read access in the leaf queue for the runnableApps . If an app has two containers at different stages in the allocation it can happen that the appAttempt is locked by one and the runnableApps by the second causing the hang. This is what I was afraid of when I mentioned the slow down, I did not anticipate it this bad but the number of reads far outnumber the writes. The earlier proposed CopyOnWriteArrayList will also not work due to the sort that is called (and I overlooked) which is not supported.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685755/YARN-2910.4.patch
          against trunk revision 8963515.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestRM
          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

          The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
          org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6033//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6033//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685755/YARN-2910.4.patch against trunk revision 8963515. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6033//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6033//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          Never mind, I misread the earlier JIRA history. I must have accidentally clicked on "Assign to me" while scrolling around.

          With the newest unit test and without the code fix (i.e. expecting failures), I'm seeing a failure rate around 70%. I think it would still be a good idea to increase the modifications to get the failure rate higher (as Tsuyoshi suggested earlier). I can get 10/10 failures with a value of 400 in the modify for loop.

          Show
          rchiang Ray Chiang added a comment - Never mind, I misread the earlier JIRA history. I must have accidentally clicked on "Assign to me" while scrolling around. With the newest unit test and without the code fix (i.e. expecting failures), I'm seeing a failure rate around 70%. I think it would still be a good idea to increase the modifications to get the failure rate higher (as Tsuyoshi suggested earlier). I can get 10/10 failures with a value of 400 in the modify for loop.
          Hide
          kasha Karthik Kambatla added a comment -

          Here is the deadlock Wilfred was mentioning:

          "FairSchedulerContinuousScheduling":
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:553)
                  - waiting to lock <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:769)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:228)
                  - locked <0x00000007f6b5ec00> (a java.util.Collections$SynchronizedRandomAccessList)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1072)
                  - locked <0x00000007f68f25e8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1005)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:280)
          "Thread-434":
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:152)
                  - waiting to lock <0x00000007f6b5ec00> (a java.util.Collections$SynchronizedRandomAccessList)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
                  - locked <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:939)
                  - locked <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
                  at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3509)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                  at java.lang.reflect.Method.invoke(Method.java:606)
                  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
                  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
                  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
                  at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
                  at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          
          Show
          kasha Karthik Kambatla added a comment - Here is the deadlock Wilfred was mentioning: "FairSchedulerContinuousScheduling": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:553) - waiting to lock <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:769) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:228) - locked <0x00000007f6b5ec00> (a java.util.Collections$SynchronizedRandomAccessList) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1072) - locked <0x00000007f68f25e8> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1005) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:280) "Thread-434": at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:152) - waiting to lock <0x00000007f6b5ec00> (a java.util.Collections$SynchronizedRandomAccessList) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180) - locked <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:939) - locked <0x00000007f6bc8f58> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3509) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          Hide
          kasha Karthik Kambatla added a comment -

          Looking around, we don't need the synchronization for FSAppAttempt#getHeadroom. That and changing the locking to use read-write locks should get us a long way towards avoiding this situation. Also, if we are locking on each access, we should be able to drop the use of Collections.synchronizedList.

          Show
          kasha Karthik Kambatla added a comment - Looking around, we don't need the synchronization for FSAppAttempt#getHeadroom. That and changing the locking to use read-write locks should get us a long way towards avoiding this situation. Also, if we are locking on each access, we should be able to drop the use of Collections.synchronizedList.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          OK, a complete new approach. The other approaches did not work or did not fix it so back to a simple lock and unlock around the read and write actions.

          The locking is setup with a fair distribution which is almost a fifo setup. This is not the default option and chosen to make sure we do not cause a thread to be starved from the lock.
          Multiple reads are allowed at the same time and only one writer with no readers at the same time.

          All junit tests pass in my local environment also other failures.
          As an extra change the synchronized has been removed from FSAppAttempt#getHeadRoom as discussed with Karthik Kambatla.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - OK, a complete new approach. The other approaches did not work or did not fix it so back to a simple lock and unlock around the read and write actions. The locking is setup with a fair distribution which is almost a fifo setup. This is not the default option and chosen to make sure we do not cause a thread to be starved from the lock. Multiple reads are allowed at the same time and only one writer with no readers at the same time. All junit tests pass in my local environment also other failures. As an extra change the synchronized has been removed from FSAppAttempt#getHeadRoom as discussed with Karthik Kambatla .
          Hide
          kasha Karthik Kambatla added a comment -

          For lock-unlock, we follow the following convention. Can we update the patch to do so.

          lock();
          try{
            // enjoy the power
          }
          finally {
            unlock();
          }
          
          Show
          kasha Karthik Kambatla added a comment - For lock-unlock, we follow the following convention. Can we update the patch to do so. lock(); try { // enjoy the power } finally { unlock(); }
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          updated patch with try finally clauses

          Show
          wilfreds Wilfred Spiegelenburg added a comment - updated patch with try finally clauses
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685934/YARN-2910.5.patch
          against trunk revision ddffcd8.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 9 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6052//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6052//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6052//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685934/YARN-2910.5.patch against trunk revision ddffcd8. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 9 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6052//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6052//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6052//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685939/YARN-2910.6.patch
          against trunk revision ddffcd8.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          -1 eclipse:eclipse. The patch failed to build with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6053//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6053//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685939/YARN-2910.6.patch against trunk revision ddffcd8. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. -1 eclipse:eclipse . The patch failed to build with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6053//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6053//console This message is automatically generated.
          Hide
          kasha Karthik Kambatla added a comment -

          mvn eclipse:eclipse passes for me locally with the patch applied.

          The patch looks mostly there. One minor comment: in FSLeafQueue#removeApp, I would avoid doing the following while holding the write lock:

                  // Update AM resource usage
                  if (app.isAmRunning() && app.getAMResource() != null) {
                    Resources.subtractFrom(amResourceUsage, app.getAMResource());
                  }
          
          Show
          kasha Karthik Kambatla added a comment - mvn eclipse:eclipse passes for me locally with the patch applied. The patch looks mostly there. One minor comment: in FSLeafQueue#removeApp, I would avoid doing the following while holding the write lock: // Update AM resource usage if (app.isAmRunning() && app.getAMResource() != null) { Resources.subtractFrom(amResourceUsage, app.getAMResource()); }
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          One final update to shorten the time we keep the lock and make sure we do the least amount of work while holding a write lock

          Show
          wilfreds Wilfred Spiegelenburg added a comment - One final update to shorten the time we keep the lock and make sure we do the least amount of work while holding a write lock
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          cleanup of spurious includes which should not be there

          Show
          wilfreds Wilfred Spiegelenburg added a comment - cleanup of spurious includes which should not be there
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685948/YARN-2910.7.patch
          against trunk revision db73cc9.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6054//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6054//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6054//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685948/YARN-2910.7.patch against trunk revision db73cc9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6054//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6054//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6054//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685950/YARN-2910.8.patch
          against trunk revision db73cc9.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6055//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6055//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6055//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685950/YARN-2910.8.patch against trunk revision db73cc9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6055//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6055//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6055//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          RE: Unit test failure. TestRMRestart passed 10 out of 10 runs for me with the patch.

          Patch looks good to me. +1 (nonbinding)

          Show
          rchiang Ray Chiang added a comment - RE: Unit test failure. TestRMRestart passed 10 out of 10 runs for me with the patch. Patch looks good to me. +1 (nonbinding)
          Hide
          kasha Karthik Kambatla added a comment -

          Ran findbugs on trunk with and without the patch locally, and didn't find any. The test failures are also flaky.

          +1. Checking this in.

          Show
          kasha Karthik Kambatla added a comment - Ran findbugs on trunk with and without the patch locally, and didn't find any. The test failures are also flaky. +1. Checking this in.
          Hide
          kasha Karthik Kambatla added a comment -

          Thanks for this critical fix, Wilfred. Just committed to trunk and branch-2.

          Show
          kasha Karthik Kambatla added a comment - Thanks for this critical fix, Wilfred. Just committed to trunk and branch-2.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6683 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6683/)
          YARN-2910. FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6683 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6683/ ) YARN-2910 . FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Wilfred Spiegelenburg, Ray Chiang, Karthik Kambatla Sorry for the delay and misreading assignee's history log. I have one question about the fix:

          +    writeLock.lock();
          +    try {
          +      Collections.sort(runnableApps, comparator);
          +    } finally {
          +      writeLock.unlock();
          +    }
          +    readLock.lock();
          +    try {
          +      for (FSAppAttempt sched : runnableApps) {
          +        if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
          +          continue;
          +        }
          +
          +        assigned = sched.assignContainer(node);
          +        if (!assigned.equals(Resources.none())) {
          +          break;
          +        }
                 }
          +    } finally {
          +      readLock.unlock();
          

          Can we really WriteLock.unlock before reading the value? The order of entries of runnableApps can be inconsistent between sort() and iteration of runnableApps.
          This can cause the breaking of the fair scheduling. I think following code flow is correct one. What do you think?

              writeLock.lock();
              try {
                Collections.sort(runnableApps, comparator);
                for (FSAppAttempt sched : runnableApps) {
                  if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
                    continue;
                  }
          
                  assigned = sched.assignContainer(node);
                  if (!assigned.equals(Resources.none())) {
                    break;
                  }
                }
              } finally {
                writeLock.unlock();
              }
          
          Show
          ozawa Tsuyoshi Ozawa added a comment - Wilfred Spiegelenburg , Ray Chiang , Karthik Kambatla Sorry for the delay and misreading assignee's history log. I have one question about the fix: + writeLock.lock(); + try { + Collections.sort(runnableApps, comparator); + } finally { + writeLock.unlock(); + } + readLock.lock(); + try { + for (FSAppAttempt sched : runnableApps) { + if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) { + continue ; + } + + assigned = sched.assignContainer(node); + if (!assigned.equals(Resources.none())) { + break ; + } } + } finally { + readLock.unlock(); Can we really WriteLock.unlock before reading the value? The order of entries of runnableApps can be inconsistent between sort() and iteration of runnableApps. This can cause the breaking of the fair scheduling. I think following code flow is correct one. What do you think? writeLock.lock(); try { Collections.sort(runnableApps, comparator); for (FSAppAttempt sched : runnableApps) { if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) { continue ; } assigned = sched.assignContainer(node); if (!assigned.equals(Resources.none())) { break ; } } } finally { writeLock.unlock(); }
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          s/the fair scheduling/the semantics of fair scheduling/

          Show
          ozawa Tsuyoshi Ozawa added a comment - s/the fair scheduling/the semantics of fair scheduling/
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Created YARN-2945 for discussing the issue.

          Anyway, I think the ConcurrentModificationException and the deadlock is fixed in the patch. Thanks for your contribution, Wilfred Spiegelenburg!

          Show
          ozawa Tsuyoshi Ozawa added a comment - Created YARN-2945 for discussing the issue. Anyway, I think the ConcurrentModificationException and the deadlock is fixed in the patch. Thanks for your contribution, Wilfred Spiegelenburg !
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #36 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/36/)
          YARN-2910. FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #36 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/36/ ) YARN-2910 . FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #771 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/771/)
          YARN-2910. FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #771 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/771/ ) YARN-2910 . FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1968 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1968/)
          YARN-2910. FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1968 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1968/ ) YARN-2910 . FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #34 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/34/)
          YARN-2910. FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #34 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/34/ ) YARN-2910 . FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #38 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/38/)
          YARN-2910. FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #38 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/38/ ) YARN-2910 . FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1988 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1988/)
          YARN-2910. FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1988 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1988/ ) YARN-2910 . FSLeafQueue can throw ConcurrentModificationException. (Wilfred Spiegelenburg via kasha) (kasha: rev a2e07a54561a57a83b943628ebbc53ed5ba52718) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Pulled this into 2.6.1. Ran compilation and TestFSLeafQueue before the push. Patch applied cleanly.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Pulled this into 2.6.1. Ran compilation and TestFSLeafQueue before the push. Patch applied cleanly.
          Hide
          zxu zhihai xu added a comment -

          linked YARN-2975 to this issue, It looks like we need both YARN-2910 and YARN-2975 to fix this issue completely.

          Show
          zxu zhihai xu added a comment - linked YARN-2975 to this issue, It looks like we need both YARN-2910 and YARN-2975 to fix this issue completely.

            People

            • Assignee:
              wilfreds Wilfred Spiegelenburg
              Reporter:
              wilfreds Wilfred Spiegelenburg
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development