Uploaded image for project: 'REEF'
  1. REEF
  2. REEF-1343

Fix events received in case of evaluator failure

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15
    • Component/s: REEF.NET
    • Labels:

      Description

      Investigation of REEF-1325 shows a weird sequence of events on local runtime:

      • evaluator crashes with an unhandled exception (shown in evaluator.stderr and .stdout files).
      • driver receives IFailedEvaluator event which doesn't have associated FailedTask object.
      • the task continues running and completes successfully
      • driver receives ICompletedTask event.

      By design, failed evaluator shouldn't allow for a successful task completion.

      This can be reproduced using TestPoisonedEvaluatorStartHanlder test.

      Update:
      The root cause is due to the Evaluator not properly closing itself and allowing the Exception to propagate upwards. This results in the RuntimeStopHandler not being invoked, and provided that the user's ITask is spun off as a fire-and-forget System.Threading.Task, its execution is independent from the main Evaluator thread. This means that when the ITask finishes, it will send a Heartbeat back to the Driver that it completed, even though in reality the Evaluator has already failed. The fix catches the Evaluator failure and propagates the Exception to RuntimeStopHandler, as well as properly closes off the ContextManager and HeartbeatManager once the Exception surfaces.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                afchung90 Andrew Chung
                Reporter:
                MariiaMykhailova Mariia Mykhailova
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: