Uploaded image for project: 'REEF'
  1. REEF
  2. REEF-1949

Closing ThreadPoolStage before tasks are finished

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.17
    • Fix Version/s: None
    • Component/s: REEF Driver
    • Labels:
      None

      Description

      In EvaluatorManager.onEvaluatorDone(),

      // This relies on the dispatcher to call the CompletedEvaluator handler.
      this.messageDispatcher.onEvaluatorCompleted(new CompletedEvaluatorImpl(this.evaluatorId)); 
      // This will close the dispatcher, which in turns shut down the executor in ThreadPoolStage.
      this.close(); 
      

      Since in onEvaluatorCompleted the message sending task is submitted to an executor, there is no guarantee that the CompletedEvaluator message will be sent before the termination of the executor in this.close() call. When this happens, the CompletedEvaluator handler will not be triggered so the driver will think that some evaluators are alive and hence hang.

      Relevant logs:

      Nov 01, 2017 11:05:57 PM org.apache.reef.wake.impl.ThreadPoolStage close
      SEVERE: Closing ThreadPoolStage EvaluatorMessageDispatcher:container_1508975419755_0006_01_000004: Executor did not terminate in 1,000 ms. Dropping 2 tasks
      Nov 01, 2017 11:05:57 PM org.apache.reef.wake.impl.ThreadPoolStage close
      SEVERE: Closing ThreadPoolStage EvaluatorMessageDispatcher:container_1508975419755_0006_01_000004: Executor failed to terminate.
      End of LogType:driver.stderr
      

        Attachments

        1. ReefDriverDebug.zip
          377 kB
          Pei Jiang

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pejian Pei Jiang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: