Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-291 Sporadic job failures in REEF-Tests
  3. REEF-312

Fail_Alarm fails with assumed state RUNNING but reported state FAILED

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.12
    • REEF-Tests
    • None
    • Windows Jenkins CI

    Description

      Observed at https://builds.apache.org/job/Reef-pull-request-windows/305/consoleFull

      I'll add subtasks as new failures are encountered.

      May 07, 2015 2:17:17 AM org.apache.reef.tests.TestEnvironmentFactory getNewTestEnvironment
      INFO: Running tests on Local
      May 07, 2015 2:17:17 AM org.apache.reef.wake.remote.address.HostnameBasedLocalAddressProvider <init>
      INFO: Instantiating HostnameBasedLocalAddressProvider
      May 07, 2015 2:17:17 AM org.apache.reef.util.REEFVersion logVersion
      INFO: REEF Version: 0.11.0-incubating
      May 07, 2015 2:17:19 AM org.apache.reef.client.DriverLauncher$RunningJobHandler onNext
      INFO: The Job Fail_Alarm is running.
      May 07, 2015 2:17:22 AM org.apache.reef.tests.TestDriverLauncher$SilentFailedTestJobHandler onNext
      INFO: Received an error for job Fail_Alarm: Optional:{java.lang.RuntimeException: org.apache.reef.exception.EvaluatorException: Evaluator [Node-1-1430990239203] is assumed to be in state [RUNNING]. But the resource manager reports it to be in state [FAILED]. This means that the Evaluator failed but wasn't able to send an error message back to the driver. Task [FailTask_FailContext_Node-1-1430990239203] was running when the Evaluator crashed.}
      May 07, 2015 2:17:22 AM org.apache.reef.tests.TestUtils assertLauncherFailure
      WARNING: Unexpected Error: FAILED(java.lang.RuntimeException: org.apache.reef.exception.EvaluatorException: Evaluator [Node-1-1430990239203] is assumed to be in state [RUNNING]. But the resource manager reports it to be in state [FAILED]. This means that the Evaluator failed but wasn't able to send an error message back to the driver. Task [FailTask_FailContext_Node-1-1430990239203] was running when the Evaluator crashed.)
      java.lang.RuntimeException: org.apache.reef.exception.EvaluatorException: Evaluator [Node-1-1430990239203] is assumed to be in state [RUNNING]. But the resource manager reports it to be in state [FAILED]. This means that the Evaluator failed but wasn't able to send an error message back to the driver. Task [FailTask_FailContext_Node-1-1430990239203] was running when the Evaluator crashed.
      	at org.apache.reef.tests.fail.driver.FailDriver$FailedEvaluatorHandler.onNext(FailDriver.java:206)
      	at org.apache.reef.tests.fail.driver.FailDriver$FailedEvaluatorHandler.onNext(FailDriver.java:201)
      	at org.apache.reef.runtime.common.utils.BroadCastEventHandler.onNext(BroadCastEventHandler.java:37)
      	at org.apache.reef.util.ExceptionHandlingEventHandler.onNext(ExceptionHandlingEventHandler.java:45)
      	at org.apache.reef.runtime.common.utils.DispatchingEStage$1.onNext(DispatchingEStage.java:67)
      	at org.apache.reef.runtime.common.utils.DispatchingEStage$1.onNext(DispatchingEStage.java:64)
      	at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:180)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      	at java.lang.Thread.run(Thread.java:722)
      Caused by: org.apache.reef.exception.EvaluatorException: Evaluator [Node-1-1430990239203] is assumed to be in state [RUNNING]. But the resource manager reports it to be in state [FAILED]. This means that the Evaluator failed but wasn't able to send an error message back to the driver. Task [FailTask_FailContext_Node-1-1430990239203] was running when the Evaluator crashed.
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onResourceStatusMessage(EvaluatorManager.java:515)
      	at org.apache.reef.runtime.common.driver.resourcemanager.ResourceStatusHandler.onNext(ResourceStatusHandler.java:57)
      	at org.apache.reef.runtime.common.driver.resourcemanager.ResourceStatusHandler.onNext(ResourceStatusHandler.java:34)
      	at org.apache.reef.runtime.local.process.ReefRunnableProcessObserver.onResourceStatus(ReefRunnableProcessObserver.java:123)
      	at org.apache.reef.runtime.local.process.ReefRunnableProcessObserver.onUncleanExit(ReefRunnableProcessObserver.java:103)
      	at org.apache.reef.runtime.local.process.ReefRunnableProcessObserver.onProcessExit(ReefRunnableProcessObserver.java:76)
      	at org.apache.reef.runtime.local.process.RunnableProcess.run(RunnableProcess.java:183)
      	... 1 more
      

      Attachments

        1. windows2-90-Fail_Alarm.zip
          220 kB
          Brian Cho

        Activity

          People

            gwsshs22 Geon-Woo Kim
            chobrian Brian Cho
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: