Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This stems from 2 issues:
1. the queryUri to find the driver http endpoint is missing a '/' after application ID.
2. YARN currently redirects the proxy call to the primary RM via a meta-refresh, which is not handled by our recovery mechanism assuming no redirects. See YARN-2605.
3. The driver does not recognize the evaluator trying to contact it and receives exception:
java.lang.RuntimeException: Contact from unknown Evaluator with identifier 'container_e02_1438245500443_0040_01_000004' with state 'RUNNING' at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:72) at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36) at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:146) at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37) at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:171) at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:152) at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:181) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Item 3 will be incorporated into the work of REEF-560 instead of being covered by this item.
Attachments
Issue Links
- is blocked by
-
REEF-559 Tighten previous evaluator ID checks by using entire set of evaluator IDs
- Resolved
- links to