Uploaded image for project: 'REEF'
  1. REEF
  2. REEF-1335

Create State Machine for IMRU fault tolerance

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 0.15
    • Component/s: IMRU, REEF.NET
    • Labels:

      Description

      To Support fault tolerant, we would like to use state machine to control the system state transitions.

      After driver is created, it will start from request evaluators and submit contexts state; after all the contexts are ready, it will move to submitting tasks state; when all the tasks are start running, it moves to tasks running state; when all the tasks are completed, the state will be changed to tasks completed. If either tasks or evaluators fail, it will change to shut down state, etc.

      Here are the proposed system states:

      • WaitingForEvaluator,
      • SubmitingTasks,
      • TasksRunning,
      • TasksCompleted,
      • ShutingDown,
      • Fail

      Here are the event that may trigger the state change:

      • AllContextsAreReady,
      • AllTasksAreRunning,
      • AllTasksAreCompleted,
      • FailedTask,
      • FailedEvaluator,
      • NotRecoverable,
      • Recover

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                juliaw Julia Wang
                Reporter:
                juliaw Julia Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: