Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-30596

Multiple POST /jars/:jarid/run requests with the same jobId, runs duplicate jobs

    XMLWordPrintableJSON

Details

    Description

      Analysis from trohrmann:

      The problem is the following: When submitting a job, then the Dispatcher will wait for the termination of a previous JobMaster. This is done to enable the proper cleanup of the job resources. In the initial submission case, there is no previous JobMaster with the same jobId. The problem is now that Flink schedules the persistAndRunJob action, which runs the newly submitted job, as an asynchronous task. This is done to ensure that the action is run on the Dispatcher's main thread since the termination future can be run on a different thread. Due to this behaviour, there can be other tasks enqueued in the Dispatcher's work queue which are executed before. Such a task could be another job submission which wouldn't see that there is already a job submitted with the same jobId since we only do this in runJob which is called by persistAndRunJob. This is the reason why you don't see a duplicate job submission exception for the second job submission. Even worse, this will eventually lead to an invalid state and fail the whole cluster entrypoint.

      The following fix to the Dispatcher seems to fix the issue, but before submitting a PR, I wanted to post this for possible follow up discussions:

      private CompletableFuture<Void> waitForTerminatingJob(
                  JobID jobId, JobGraph jobGraph, ThrowingConsumer<JobGraph, ?> action) {
              ...
              return FutureUtils.thenAcceptAsyncIfNotDone(
                      jobManagerTerminationFuture,
                      getMainThreadExecutor(),
                      FunctionUtils.uncheckedConsumer(
                          (ignored) -> {
                              jobManagerRunnerTerminationFutures.remove(jobId);
                              action.accept(jobGraph);
                          }));
          }
      

      Attachments

        Activity

          People

            morezaei00 Mohsen Rezaei
            morezaei00 Mohsen Rezaei
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: