Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Later
-
0.11
-
None
-
REEF .NET on HDInsight
Description
Race condition and clock skew issues in Tasks reporting back to JobDriver.
Stacktrace:
May 03, 2015 5:16:54 PM org.apache.reef.runtime.common.launch.REEFErrorHandler onNext SEVERE: Uncaught exception. java.lang.RuntimeException: Received an message of state RUNNING, not INIT or FAILED for Task 894e954e-a1a1-412a-85c6-d515956d6140_635662701583621790 which we haven't heard from before. at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onTaskStatusMessage(EvaluatorManager.java:459) at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onEvaluatorHeartbeatMessage(EvaluatorManager.java:341) at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:61) at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36) at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:145) at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37) at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:164) at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:146) at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:180) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) May 03, 2015 5:16:54 PM org.apache.reef.runtime.common.launch.REEFErrorHandler onNext SEVERE: Caught an exception from Wake we cannot send upstream because there is no upstream May 03, 2015 5:16:58 PM org.apache.reef.runtime.common.launch.REEFErrorHandler onNext SEVERE: Uncaught exception. java.lang.RuntimeException: Received an message of state RUNNING, not INIT or FAILED for Task 894e954e-a1a1-412a-85c6-d515956d6140_635662701583621790 which we haven't heard from before. at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onTaskStatusMessage(EvaluatorManager.java:459) at org.apache.reef.runtime.common.driver.evaluator.EvaluatorManager.onEvaluatorHeartbeatMessage(EvaluatorManager.java:341) at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:61) at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36) at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:145) at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37) at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:164) at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:146) at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:180) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Attachments
Issue Links
- is superceded by
-
REEF-289 Rewrite .NET Evaluator with Tang
- Resolved
- links to