Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-1335

Create State Machine for IMRU fault tolerance

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • None
    • 0.15
    • IMRU, REEF.NET

    Description

      To Support fault tolerant, we would like to use state machine to control the system state transitions.

      After driver is created, it will start from request evaluators and submit contexts state; after all the contexts are ready, it will move to submitting tasks state; when all the tasks are start running, it moves to tasks running state; when all the tasks are completed, the state will be changed to tasks completed. If either tasks or evaluators fail, it will change to shut down state, etc.

      Here are the proposed system states:

      • WaitingForEvaluator,
      • SubmitingTasks,
      • TasksRunning,
      • TasksCompleted,
      • ShutingDown,
      • Fail

      Here are the event that may trigger the state change:

      • AllContextsAreReady,
      • AllTasksAreRunning,
      • AllTasksAreCompleted,
      • FailedTask,
      • FailedEvaluator,
      • NotRecoverable,
      • Recover

      Attachments

        Issue Links

          Activity

            People

              juliaw Julia Wang
              juliaw Julia Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: