Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-308

Out of order state transition messages from the .NET Evaluator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Later
    • 0.11
    • 0.12
    • REEF.NET Evaluator
    • None
    • REEF .NET on HDInsight

    Description

      Race condition and clock skew issues in Tasks reporting back to JobDriver.

      Stacktrace:

      May 03, 2015 5:16:54 PM org.apache.reef.runtime.common.launch.REEFErrorHandler onNext
      SEVERE: Uncaught exception.
      java.lang.RuntimeException: Received an message of state RUNNING, not INIT or FAILED for Task 894e954e-a1a1-412a-85c6-d515956d6140_635662701583621790 which we haven't heard from before.
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onTaskStatusMessage(EvaluatorManager.java:459)
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onEvaluatorHeartbeatMessage(EvaluatorManager.java:341)
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:61)
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36)
      	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:145)
      	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37)
      	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:164)
      	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:146)
      	at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:180)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      May 03, 2015 5:16:54 PM org.apache.reef.runtime.common.launch.REEFErrorHandler onNext
      SEVERE: Caught an exception from Wake we cannot send upstream because there is no upstream
      May 03, 2015 5:16:58 PM org.apache.reef.runtime.common.launch.REEFErrorHandler onNext
      SEVERE: Uncaught exception.
      java.lang.RuntimeException: Received an message of state RUNNING, not INIT or FAILED for Task 894e954e-a1a1-412a-85c6-d515956d6140_635662701583621790 which we haven't heard from before.
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onTaskStatusMessage(EvaluatorManager.java:459)
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onEvaluatorHeartbeatMessage(EvaluatorManager.java:341)
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:61)
      	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36)
      	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:145)
      	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37)
      	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:164)
      	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:146)
      	at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:180)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Issue Links

          Activity

            People

              markus.weimer Markus Weimer
              afchung90 Andrew Chung
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: