Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4478 [Umbrella] : Track all the Test failures in YARN
  3. YARN-4393

TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

    Details

    • Hadoop Flags:
      Reviewed

      Description

      Tsuyoshi Ozawa pointed out this failure on YARN-4380.
      Check https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773

      sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
      testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) Time elapsed: 0.093 sec <<< FAILURE!
      org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
      Argument(s) are different! Wanted:
      eventHandler.handle(
      <custom argument matcher>
      );
      -> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
      Actual invocation has different arguments:
      eventHandler.handle(
      EventType: APPLICATION_INITED
      );
      -> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
      

        Issue Links

          Activity

          Hide
          jlowe Jason Lowe added a comment -

          Thanks Varun Saxena! I committed this also to branch-2.8, branch-2.7 and branch-2.6 since this was originally broken when YARN-90 went into branch-2.6.

          Show
          jlowe Jason Lowe added a comment - Thanks Varun Saxena ! I committed this also to branch-2.8, branch-2.7 and branch-2.6 since this was originally broken when YARN-90 went into branch-2.6.
          Hide
          varun_saxena Varun Saxena added a comment -

          Thanks Rohith Sharma K S for the review and commit.
          Thanks Tsuyoshi Ozawa for the review and originally reporting this issue(while reviewing YARN-4380)

          Show
          varun_saxena Varun Saxena added a comment - Thanks Rohith Sharma K S for the review and commit. Thanks Tsuyoshi Ozawa for the review and originally reporting this issue(while reviewing YARN-4380 )
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9062 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9062/)
          YARN-4393. Fix intermittent test failure for (rohithsharmaks: rev 791c1639ae0b351e0bf0b2ecec854dc72ab07935)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9062 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9062/ ) YARN-4393 . Fix intermittent test failure for (rohithsharmaks: rev 791c1639ae0b351e0bf0b2ecec854dc72ab07935) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Committed the patch trunk/branch-2.. thanks Varun for contributions and thanks Tsuyoshi for review. I have committed as it was failing build most of the time.

          Show
          rohithsharma Rohith Sharma K S added a comment - Committed the patch trunk/branch-2.. thanks Varun for contributions and thanks Tsuyoshi for review. I have committed as it was failing build most of the time.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Committing shortly

          Show
          rohithsharma Rohith Sharma K S added a comment - Committing shortly
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Tsuyoshi Ozawa would you have look at this please, do you have any comments?

          Show
          rohithsharma Rohith Sharma K S added a comment - Tsuyoshi Ozawa would you have look at this please, do you have any comments?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I think Varun's analysis make sense to me that need not add everywhere dispatcher.await. We can use dispatcher.await only when asserting for event type OR asserting any functionality after processing that event from dispatcher may be asserting for the values.

          Show
          rohithsharma Rohith Sharma K S added a comment - I think Varun's analysis make sense to me that need not add everywhere dispatcher.await. We can use dispatcher.await only when asserting for event type OR asserting any functionality after processing that event from dispatcher may be asserting for the values.
          Hide
          varun_saxena Varun Saxena added a comment -

          Tsuyoshi Ozawa, kindly look at the analysis above. And whether you agree with it.

          Show
          varun_saxena Varun Saxena added a comment - Tsuyoshi Ozawa , kindly look at the analysis above. And whether you agree with it.
          Hide
          varun_saxena Varun Saxena added a comment -
          Show
          varun_saxena Varun Saxena added a comment - Tsuyoshi Ozawa / Rohith Sharma K S , kindly review
          Hide
          varun_saxena Varun Saxena added a comment -

          Tsuyoshi Ozawa, are these tests failing ? Maybe there is some other problem.
          Because dispatcher.await() would only make sense if we are verifying an event which we expect to be sent to dispatcher or verifying something which happens upon processing of such an event.
          In the instances pointed above, I think it should be fine even without dispatcher.await()

          1. In testRecovery, the statements pointed out above, we are first checking for whether localization has started or not for a resource and then posting ResourceLocallizedEvent to localizer tracker. We are not verifying what happens due to handling of that event. If you notice we do have a dispatcher.await() right after these set of statements. And after that only we check the state of resource. So this single dispatcher.await() should be enough here.

          2. In testResourceRelease also, the statement being verified is that whether cleanupPrivLocalizers has been called. This method is called when we call ResourceLocalizationService#handle with ContainerLocalizationCleanupEvent directly. Because handle() will call handleCleanupContainerResources() where this method is called. We do not need to wait here as we will be able to verify this method invocation as soon as call to handle() finishes. And we do have dispatcher.await() for later verifications.

           spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <-- here!
                verify(mockLocallilzerTracker)
                  .cleanupPrivLocalizers("container_314159265358979_0003_01_000042");
          

          3. For testFailedDirsResourceRelease too, same explanation holds true. Even the delete statements being verified below the code snippet pointed out by you will be called in handleCleanupContainerResources(). So we do not need to wait. And we have dispatcher.await() after these verifications.

          Thoughts ?

          Show
          varun_saxena Varun Saxena added a comment - Tsuyoshi Ozawa , are these tests failing ? Maybe there is some other problem. Because dispatcher.await() would only make sense if we are verifying an event which we expect to be sent to dispatcher or verifying something which happens upon processing of such an event. In the instances pointed above, I think it should be fine even without dispatcher.await() 1. In testRecovery, the statements pointed out above, we are first checking for whether localization has started or not for a resource and then posting ResourceLocallizedEvent to localizer tracker. We are not verifying what happens due to handling of that event. If you notice we do have a dispatcher.await() right after these set of statements. And after that only we check the state of resource. So this single dispatcher.await() should be enough here. 2. In testResourceRelease also, the statement being verified is that whether cleanupPrivLocalizers has been called. This method is called when we call ResourceLocalizationService#handle with ContainerLocalizationCleanupEvent directly. Because handle() will call handleCleanupContainerResources() where this method is called. We do not need to wait here as we will be able to verify this method invocation as soon as call to handle() finishes. And we do have dispatcher.await() for later verifications. spyService.handle( new ContainerLocalizationCleanupEvent(c, req)); // <-- here! verify(mockLocallilzerTracker) .cleanupPrivLocalizers( "container_314159265358979_0003_01_000042" ); 3. For testFailedDirsResourceRelease too, same explanation holds true. Even the delete statements being verified below the code snippet pointed out by you will be called in handleCleanupContainerResources(). So we do not need to wait. And we have dispatcher.await() after these verifications. Thoughts ?
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Varun Saxena, before committing, I found that there are some missing dispatcher.await():

          testResourceRelease:

                //Send Cleanup Event
                spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <-- here!
                verify(mockLocallilzerTracker)
                  .cleanupPrivLocalizers("container_314159265358979_0003_01_000042");
                req2.remove(LocalResourceVisibility.PRIVATE);
                spyService.handle(new ContainerLocalizationCleanupEvent(c, req2));
                dispatcher.await();
          

          testFailedDirsResourceRelease:

                // Send Cleanup Event
                spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <- here!
                verify(mockLocallilzerTracker).cleanupPrivLocalizers(
                  "container_314159265358979_0003_01_000042");
          

          testRecovery:

                assertNotNull("Localization not started", privLr1.getLocalPath());
                privTracker1.handle(new ResourceLocalizedEvent(privReq1,
                    privLr1.getLocalPath(), privLr1.getSize() + 5));
                assertNotNull("Localization not started", privLr2.getLocalPath());
                privTracker1.handle(new ResourceLocalizedEvent(privReq2,
                    privLr2.getLocalPath(), privLr2.getSize() + 10));
                assertNotNull("Localization not started", appLr1.getLocalPath());
                appTracker1.handle(new ResourceLocalizedEvent(appReq1,
                    appLr1.getLocalPath(), appLr1.getSize()));
                assertNotNull("Localization not started", appLr3.getLocalPath());
                appTracker2.handle(new ResourceLocalizedEvent(appReq3,
                    appLr3.getLocalPath(), appLr3.getSize() + 7));
                assertNotNull("Localization not started", pubLr1.getLocalPath());
                pubTracker.handle(new ResourceLocalizedEvent(pubReq1,
                    pubLr1.getLocalPath(), pubLr1.getSize() + 1000));
                assertNotNull("Localization not started", pubLr2.getLocalPath());
                pubTracker.handle(new ResourceLocalizedEvent(pubReq2,
                    pubLr2.getLocalPath(), pubLr2.getSize() + 99999));
          

          Could you update them?

          Show
          ozawa Tsuyoshi Ozawa added a comment - Varun Saxena , before committing, I found that there are some missing dispatcher.await(): testResourceRelease: //Send Cleanup Event spyService.handle( new ContainerLocalizationCleanupEvent(c, req)); // <-- here! verify(mockLocallilzerTracker) .cleanupPrivLocalizers( "container_314159265358979_0003_01_000042" ); req2.remove(LocalResourceVisibility.PRIVATE); spyService.handle( new ContainerLocalizationCleanupEvent(c, req2)); dispatcher.await(); testFailedDirsResourceRelease: // Send Cleanup Event spyService.handle( new ContainerLocalizationCleanupEvent(c, req)); // <- here! verify(mockLocallilzerTracker).cleanupPrivLocalizers( "container_314159265358979_0003_01_000042" ); testRecovery: assertNotNull( "Localization not started" , privLr1.getLocalPath()); privTracker1.handle( new ResourceLocalizedEvent(privReq1, privLr1.getLocalPath(), privLr1.getSize() + 5)); assertNotNull( "Localization not started" , privLr2.getLocalPath()); privTracker1.handle( new ResourceLocalizedEvent(privReq2, privLr2.getLocalPath(), privLr2.getSize() + 10)); assertNotNull( "Localization not started" , appLr1.getLocalPath()); appTracker1.handle( new ResourceLocalizedEvent(appReq1, appLr1.getLocalPath(), appLr1.getSize())); assertNotNull( "Localization not started" , appLr3.getLocalPath()); appTracker2.handle( new ResourceLocalizedEvent(appReq3, appLr3.getLocalPath(), appLr3.getSize() + 7)); assertNotNull( "Localization not started" , pubLr1.getLocalPath()); pubTracker.handle( new ResourceLocalizedEvent(pubReq1, pubLr1.getLocalPath(), pubLr1.getSize() + 1000)); assertNotNull( "Localization not started" , pubLr2.getLocalPath()); pubTracker.handle( new ResourceLocalizedEvent(pubReq2, pubLr2.getLocalPath(), pubLr2.getSize() + 99999)); Could you update them?
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          +1, checking this in.

          Show
          ozawa Tsuyoshi Ozawa added a comment - +1, checking this in.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          oops, commentted on wrong jira. Reopening.

          Show
          ozawa Tsuyoshi Ozawa added a comment - oops, commentted on wrong jira. Reopening.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Committed this to trunk, branch-2, and branch-2.7. Thanks Varun Saxena for your contribution.

          Show
          ozawa Tsuyoshi Ozawa added a comment - Committed this to trunk, branch-2, and branch-2.7. Thanks Varun Saxena for your contribution.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 48s trunk passed
          +1 compile 0m 27s trunk passed with JDK v1.8.0_66
          +1 compile 0m 27s trunk passed with JDK v1.7.0_85
          +1 checkstyle 0m 11s trunk passed
          +1 mvnsite 0m 29s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 0m 59s trunk passed
          +1 javadoc 0m 17s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 21s trunk passed with JDK v1.7.0_85
          +1 mvninstall 0m 27s the patch passed
          +1 compile 0m 23s the patch passed with JDK v1.8.0_66
          +1 javac 0m 23s the patch passed
          +1 compile 0m 27s the patch passed with JDK v1.7.0_85
          +1 javac 0m 27s the patch passed
          +1 checkstyle 0m 11s the patch passed
          +1 mvnsite 0m 29s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 2s the patch passed
          +1 javadoc 0m 18s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 21s the patch passed with JDK v1.7.0_85
          +1 unit 8m 29s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
          +1 unit 9m 2s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85.
          +1 asflicense 0m 26s Patch does not generate ASF License warnings.
          34m 7s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12774282/YARN-4393.01.patch
          JIRA Issue YARN-4393
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e1d35d909a40 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b4c6b51
          findbugs v3.0.0
          JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9791/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Max memory used 75MB
          Powered by Apache Yetus http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9791/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 48s trunk passed +1 compile 0m 27s trunk passed with JDK v1.8.0_66 +1 compile 0m 27s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 11s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 59s trunk passed +1 javadoc 0m 17s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 21s trunk passed with JDK v1.7.0_85 +1 mvninstall 0m 27s the patch passed +1 compile 0m 23s the patch passed with JDK v1.8.0_66 +1 javac 0m 23s the patch passed +1 compile 0m 27s the patch passed with JDK v1.7.0_85 +1 javac 0m 27s the patch passed +1 checkstyle 0m 11s the patch passed +1 mvnsite 0m 29s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 2s the patch passed +1 javadoc 0m 18s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 21s the patch passed with JDK v1.7.0_85 +1 unit 8m 29s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 9m 2s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. +1 asflicense 0m 26s Patch does not generate ASF License warnings. 34m 7s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12774282/YARN-4393.01.patch JIRA Issue YARN-4393 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e1d35d909a40 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b4c6b51 findbugs v3.0.0 JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9791/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Max memory used 75MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/9791/console This message was automatically generated.

            People

            • Assignee:
              varun_saxena Varun Saxena
              Reporter:
              varun_saxena Varun Saxena
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development