Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: JobManager
    • Labels:
      None

      Description

      When the JobManagerRunner grants leadership, it should check whether the current job is already running or not. If the job is running, the JobManager should reconcile itself (enter RECONCILING state) and waits for the TaskManager reporting task status. Otherwise the JobManger can schedule the ExecutionGraph in common way.

      The RunningJobsRegistry can provide the way to check the job running status, but we should expand the current interface and fix the related process to support this function.

      1. RunningJobsRegistry sets RUNNING status after JobManagerRunner granting leadership at the first time.

      2. If the job finishes, the job status will be set FINISHED by RunningJobsRegistry and the status will be deleted before exit.

      3. If the mini cluster starts multi JobManagerRunner, and the leader JobManagerRunner already finishes the job to set the job status FINISHED, other JobManagerRunner will exit after grants the leadership again.

      4. If the JobManager fails, the job status will be still in RUNNING. So if the JobManagerRunner (the previous or new one) grants leadership again, it will check the job status and enters RECONCILING state.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tiemsn shuai.xu
                Reporter:
                zjwang zhijiang
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: