Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32883

Support for standby task managers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • kubernetes-operator-1.6.0
    • None
    • Kubernetes Operator
    • None

    Description

      https://docs.ververica.com/user_guide/application_operations/deployments/scaling.html#run-with-standby-taskmanager
      I would like to be able to support standby task managers. Because on K8s, pods are often evicted or deleted due to node failure or autoscaling.

      With the current implementation, it is not possible to set up a standby task manager, and jobs cannot run until all task managers are up and running. If a standby task manager could be supported, jobs could continue to run without downtime using the standby task manager, even if the task manager is unexpectedly deleted.

      https://github.com/apache/flink-kubernetes-operator/blob/release-1.6.0/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java#L370-L380
      If the task manager's number of replicas is set, the job's parallelism setting is ignored, but it should be possible to support a standby task manager by automatically setting parallelism to the replicas*task slot only if the job's parallelism is not set (i.e. 0) and using that value if parallelism is set. 

      If this change looks good, I will send a PR on GitHub.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              laughingman7743 Tomoyuki NAKAMURA
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: