Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32548 Make watermark alignment ready for production use
  3. FLINK-32316

Duplicated announceCombinedWatermark task maybe scheduled if jobmanager failover

    XMLWordPrintableJSON

Details

    Description

      When we try SourceAlignment feature, we found there will be a duplicated announceCombinedWatermark task will be scheduled after JobManager failover and auto recover job from checkpoint.

      The reason i think is  we should schedule announceCombinedWatermark task during SourceCoordinator::start function not in SourceCoordinator construct function (see  https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L149 ), because when jobManager encounter failover and auto recover job, it will create SourceCoordinator twice:

      • The first one is  when JobMaster is created it will create the DefaultExecutionGraph, this will init the first sourceCoordinator but will not start it.
      • The Second one is JobMaster call restoreLatestCheckpointedStateInternal method, which will be reset old sourceCoordinator and initialize a new one, but because the first sourceCoordinator is not started(SourceCoordinator will be started before SchedulerBase::startScheduling), so the first SourceCoordinator will not be fully closed.

       

      And we also found another problem see jira: https://issues.apache.org/jira/browse/FLINK-32362

      Attachments

        Issue Links

          Activity

            People

              cailiuyang Cai Liuyang
              cailiuyang Cai Liuyang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: