Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5616

YarnPreConfiguredMasterHaServicesTest fails sometimes

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: YARN
    • Labels:

      Description

      This is the relevant part from the log:

      -------------------------------------------------------
       T E S T S
      -------------------------------------------------------
      Running org.apache.flink.yarn.highavailability.YarnPreConfiguredMasterHaServicesTest
      Formatting using clusterid: testClusterID
      Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.407 sec - in org.apache.flink.yarn.highavailability.YarnPreConfiguredMasterHaServicesTest
      Running org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest
      Formatting using clusterid: testClusterID
      Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.479 sec <<< FAILURE! - in org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest
      testClosingReportsToLeader(org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest)  Time elapsed: 0.836 sec  <<< FAILURE!
      org.mockito.exceptions.verification.WantedButNotInvoked: 
      Wanted but not invoked:
      leaderContender.handleError(<any>);
      -> at org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader(YarnIntraNonHaMasterServicesTest.java:120)
      Actually, there were zero interactions with this mock.
      
      	at org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader(YarnIntraNonHaMasterServicesTest.java:120)
      
      Running org.apache.flink.yarn.YarnFlinkResourceManagerTest
      Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.82 sec - in org.apache.flink.yarn.YarnFlinkResourceManagerTest
      Running org.apache.flink.yarn.YarnClusterDescriptorTest
      java.lang.RuntimeException: Couldn't deploy Yarn cluster
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:425)
      	at org.apache.flink.yarn.YarnClusterDescriptorTest.testConfigOverwrite(YarnClusterDescriptorTest.java:90)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:483)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
      	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
      	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
      	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
      	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
      	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      Caused by: org.apache.flink.configuration.IllegalConfigurationException: The number of virtual cores per node were configured with 2147483647 but Yarn only has 8 virtual cores available. Please note that the number of virtual cores is set to the number of task slots by default unless configured in the Flink config with 'yarn.containers.vcores.'
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.isReadyForDeployment(AbstractYarnClusterDescriptor.java:320)
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:434)
      	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:423)
      	... 28 more
      Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.338 sec - in org.apache.flink.yarn.YarnClusterDescriptorTest
      
      Results :
      
      Failed tests: 
        YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader:120 
      Wanted but not invoked:
      leaderContender.handleError(<any>);
      -> at org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader(YarnIntraNonHaMasterServicesTest.java:120)
      Actually, there were zero interactions with this mock.
      
      
      Tests run: 10, Failures: 1, Errors: 0, Skipped: 0
      

      https://s3.amazonaws.com/archive.travis-ci.org/jobs/194432647/log.txt

        Issue Links

          Activity

          Hide
          till.rohrmann Till Rohrmann added a comment -

          I could not really reproduce this problem. My guess would be that the timeout for the verify clause was too aggressive. I've removed it and set a unit test timeout of 5s. Maybe this fixes the problem.

          Show
          till.rohrmann Till Rohrmann added a comment - I could not really reproduce this problem. My guess would be that the timeout for the verify clause was too aggressive. I've removed it and set a unit test timeout of 5s. Maybe this fixes the problem.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/3327

          FLINK-5616 [tests] Harden YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader

          Remove the verify timeout and instead set an increased unit test timeout of 5 seconds.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink FLINK-5616

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3327.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3327


          commit 87342d549b7181013d20873453468e5fbd900cbf
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2017-02-15T17:15:50Z

          FLINK-5616 [tests] Harden YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader

          Remove the verify timeout and instead set an increased unit test timeout of 5 seconds.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3327 FLINK-5616 [tests] Harden YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader Remove the verify timeout and instead set an increased unit test timeout of 5 seconds. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink FLINK-5616 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3327.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3327 commit 87342d549b7181013d20873453468e5fbd900cbf Author: Till Rohrmann <trohrmann@apache.org> Date: 2017-02-15T17:15:50Z FLINK-5616 [tests] Harden YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader Remove the verify timeout and instead set an increased unit test timeout of 5 seconds.
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Forget what I've written. It simply is a race condition between the grant leadership call (executed asynchronously) and the check. I'll fix this.

          Show
          till.rohrmann Till Rohrmann added a comment - Forget what I've written. It simply is a race condition between the grant leadership call (executed asynchronously) and the check. I'll fix this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3327

          I am wondering if this is the same as the original timeout meaning.

          I though that using `verify` with a timeout will wait for a while until the call must have happened. Without the timeout, the verification will fail if the call had not happened by the time the verification runs.

          Adding the timeout to the test as a whole means that the test may not take longer than that timeout.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3327 I am wondering if this is the same as the original timeout meaning. I though that using `verify` with a timeout will wait for a while until the call must have happened. Without the timeout, the verification will fail if the call had not happened by the time the verification runs. Adding the timeout to the test as a whole means that the test may not take longer than that timeout.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3327

          True, this does not make much sense, what I've done here. Somehow I was under the assumption that `verify` will block until the call happened when I wrote it. Will correct it. Thanks for the review

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3327 True, this does not make much sense, what I've done here. Somehow I was under the assumption that `verify` will block until the call happened when I wrote it. Will correct it. Thanks for the review
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3327

          Travis now passed Merging this PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3327 Travis now passed Merging this PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3327

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3327
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Fixed via cc9334a4694b06abde2723548f9576256ae063db

          Show
          till.rohrmann Till Rohrmann added a comment - Fixed via cc9334a4694b06abde2723548f9576256ae063db

            People

            • Assignee:
              till.rohrmann Till Rohrmann
              Reporter:
              aljoscha Aljoscha Krettek
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development