Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7548

TestCapacityOverTimePolicy.testAllocation is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.0.0-beta1
    • None
    • reservation system
    • None

    Description

      Reported at: 15/Nov/18 20:32

      It failed in both YARN-7337 and YARN-6921 jenkins jobs.

      org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation[Duration 90,000,000, height 0.25, numSubmission 1, periodic 86400000)]

      Stacktrace

      junit.framework.AssertionFailedError: null
       at junit.framework.Assert.fail(Assert.java:55)
       at junit.framework.Assert.fail(Assert.java:64)
       at junit.framework.TestCase.fail(TestCase.java:235)
       at org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:146)
       at org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136)

      Standard Output

      2017-11-20 23:57:03,759 INFO [main] recovery.RMStateStore (RMStateStore.java:transition(538)) - Storing reservation allocation.reservation_-9026698577416205920_6337917439559340517
       2017-11-20 23:57:03,759 INFO [main] recovery.RMStateStore (MemoryRMStateStore.java:storeReservationState(247)) - Storing reservationallocation for reservation_-9026698577416205920_6337917439559340517 for plan dedicated
       2017-11-20 23:57:03,760 INFO [main] reservation.InMemoryPlan (InMemoryPlan.java:addReservation(373)) - Successfully added reservation: reservation_-9026698577416205920_6337917439559340517 to plan.
       In-memory Plan: Parent Queue: dedicatedTotal Capacity: <memory:1024000, vCores:1000>Step: 1000reservation_-9026698577416205920_6337917439559340517 user:u1 startTime: 0 endTime: 86400000 Periodiciy: 86400000 alloc:
       [Period: 86400000
       0: <memory:256000, vCores:250>
       3423748: <memory:0, vCores:0>
       86223748: <memory:256000, vCores:250>
       86400000: <memory:0, vCores:0>
       9223372036854775807: null
       ]
      
      

      Reported at: 21/Feb/24

      Ran TestCapacityOverTimePolicy testcase locally 100 times in a row and found it failed 5 times with the below error:

      [INFO] Running org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy
      [ERROR] Tests run: 30, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.503 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy
      [ERROR] testAllocation[Duration 60,000, height 0.25, numSubmission 3, periodic 7200000)](org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy)  Time elapsed: 0.009 s  <<< ERROR!
      org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningQuotaException: Integral (avg over time) quota capacity 0.25 over a window of 86400 seconds,  would be exceeded by accepting reservation: reservation_-7619846766601560789_3793931544284185119
              at org.apache.hadoop.yarn.server.resourcemanager.reservation.CapacityOverTimePolicy.validate(CapacityOverTimePolicy.java:206)
              at org.apache.hadoop.yarn.server.resourcemanager.reservation.InMemoryPlan.addReservation(InMemoryPlan.java:348)
              at org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:141)
              at org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
              at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
              at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
              at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
              at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
              at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
              at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
              at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
              at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
              at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
              at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
              at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
              at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
              at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
              at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
              at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
              at org.junit.runners.Suite.runChild(Suite.java:128)
              at org.junit.runners.Suite.runChild(Suite.java:27)
              at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
              at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
              at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
              at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
              at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
              at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
              at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
              at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
              at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
              at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
              at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
              at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
              at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
              at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
              at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
      Caused by: org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException: RLESparseResourceAllocation: merge failed as the resulting RLESparseResourceAllocation would be negative, when testing: (-9223372036768375809=<memory:545778, vCores:533>) > (-172800000=<memory:256000, vCores:250>)
              at org.apache.hadoop.yarn.server.resourcemanager.reservation.RLESparseResourceAllocation.combineValue(RLESparseResourceAllocation.java:462)
              at org.apache.hadoop.yarn.server.resourcemanager.reservation.RLESparseResourceAllocation.merge(RLESparseResourceAllocation.java:353)
              at org.apache.hadoop.yarn.server.resourcemanager.reservation.RLESparseResourceAllocation.merge(RLESparseResourceAllocation.java:312)
              at org.apache.hadoop.yarn.server.resourcemanager.reservation.CapacityOverTimePolicy.validate(CapacityOverTimePolicy.java:197)
              ... 40 more

       

       

      Attachments

        Issue Links

          Activity

            People

              susheel_7 Susheel Gupta
              haibochen Haibo Chen
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: