Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-1251

IMRU Driver handlers for Fault Tolerant

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.16

    Description

      Handles communications between driver and evaluators for evaluator and task recovery when some evaluators fail. The following describe a flow for an example:
      Here is the control flow in normal scenario:
      a. All the task, context and task status information is maintained in Task Manager when tasks are created at the first time
      b. Task1, task2, Task3 s are queued in Task Starter
      c. When all tasks in a group is ready, tasks are submitted
      d. When tasks start running, task status is updated in Task Manager
      e. Evaluator 3 failed
      f. Driver received failed evaluator event and report it to Evaluator Manager
      g. Task Manager update task status to set task3 as failed
      h. Driver send message to task1 and task2 to stop them and update task status in Task Manager
      i. Driver request a new evaluator3’ for failed evaluator and submit a new context3’ for it and add a new task3’ to the queue
      j. Driver recreate task1’ and task2’ with existing context1 and context2 add them to the queue
      k. When all the new tasks in the communication group are ready, start tasks as in step c.

      Attachments

        Issue Links

          Activity

            People

              juliaw Julia Wang
              juliaw Julia Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment