Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14434

Dispatcher#createJobManagerRunner should not start JobManagerRunner

    XMLWordPrintableJSON

    Details

      Description

      In an edge case, let's said

      1) job finished nearly immediately
      2) Dispatcher has been suspended in #startJobManagerRunner after jobManagerRunner.start(); but before return jobManagerRunner;

      due to

      1) we put jobManagerRunnerFutures with #startJobManagerRunner finished.
      2) the creation of JobManagerRunner doesn't happen in MainThread.

      it is a possible execution order

      1) JobManagerRunner created in akka-dispatcher thread
      2) then apply Dispatcher#startJobManagerRunner
      3) until jobManagerRunner.start(); and before return jobManagerRunner;
      4) this thread suspended
      5) job finished, execute callback on MainThread
      6) jobManagerRunnerFutures.get(jobID).getNow(null) returns null because akka-dispatcher thread doesn't return jobManagerRunner;
      7) it report There is a newer JobManagerRunner for the job but actually not.

      *Solution*

      Two perspective but we can even have them both.

      1. return jobManagerRunnerFuture in #createJobManagerRunner, let #startJobManagerRunner an action
      2. on JobManagerRunner created, execute #startJobManagerRunner in MainThread.

      CC Till Rohrmann

        Attachments

        1. patch.diff
          2 kB
          Zili Chen

          Issue Links

            Activity

              People

              • Assignee:
                tison Zili Chen
                Reporter:
                tison Zili Chen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m