Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-1343

Fix events received in case of evaluator failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 0.15
    • REEF.NET

    Description

      Investigation of REEF-1325 shows a weird sequence of events on local runtime:

      • evaluator crashes with an unhandled exception (shown in evaluator.stderr and .stdout files).
      • driver receives IFailedEvaluator event which doesn't have associated FailedTask object.
      • the task continues running and completes successfully
      • driver receives ICompletedTask event.

      By design, failed evaluator shouldn't allow for a successful task completion.

      This can be reproduced using TestPoisonedEvaluatorStartHanlder test.

      Update:
      The root cause is due to the Evaluator not properly closing itself and allowing the Exception to propagate upwards. This results in the RuntimeStopHandler not being invoked, and provided that the user's ITask is spun off as a fire-and-forget System.Threading.Task, its execution is independent from the main Evaluator thread. This means that when the ITask finishes, it will send a Heartbeat back to the Driver that it completed, even though in reality the Evaluator has already failed. The fix catches the Evaluator failure and propagates the Exception to RuntimeStopHandler, as well as properly closes off the ContextManager and HeartbeatManager once the Exception surfaces.

      Attachments

        Issue Links

          Activity

            People

              afchung90 Andrew Chung
              MariiaMykhailova Mariia Mykhailova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: