Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.10.0
Description
So far, Flink has made efforts for the native integration of Kubernetes. However, it is always essential to evaluate the existing design and consider alternatives that have better design and are easier to maintain in the long run. We have suffered from some problems while developing new features base on the current code. Here is some of them:
- We don’t have a unified monadic-step based orchestrator architecture to construct all the Kubernetes resources.
- There are inconsistencies between the orchestrator architecture that client uses to create the Kubernetes resources, and the orchestrator architecture that the master uses to create Pods; this confuses new contributors, as there is a cognitive burden to understand two architectural philosophies instead of one; for another, maintenance and new feature development become quite challenging.
- Pod construction is done in one step. With the introduction of new features for the Pod, the construction process could become far more complicated, and the functionality of a single class could explode, which hurts code readability, writability, and testability. At the moment, we have encountered such challenges and realized that it is not an easy thing to develop new features related to the Pod.
- The implementations of a specific feature are usually scattered in multiple decoration classes. For example, the current design uses a decoration class chain that contains five Decorator class to mount a configuration file to the Pod. If people would like to introduce other configuration files support, such as Hadoop configuration or Keytab files, they have no choice but to repeat the same tedious and scattered process.
- We don’t have dedicated objects or tools for centrally parsing, verifying, and managing the Kubernetes parameters, which has raised some maintenance and inconsistency issues.
- There are many duplicated parsing and validating code, including settings of Image, ImagePullPolicy, ClusterID, ConfDir, Labels, etc. It not only harms readability and testability but also is prone to mistakes. Refer to issue
FLINK-16025for inconsistent parsing of the same parameter. - The parameters are scattered so that some of the method signatures have to declare many unnecessary input parameters, such as FlinkMasterDeploymentDecorator#createJobManagerContainer.
- There are many duplicated parsing and validating code, including settings of Image, ImagePullPolicy, ClusterID, ConfDir, Labels, etc. It not only harms readability and testability but also is prone to mistakes. Refer to issue
For solving these issues, we propose to
- Introduce a unified monadic-step based orchestrator architecture that has a better, cleaner and consistent abstraction for the Kubernetes resources construction process.
- Add some dedicated tools for centrally parsing, verifying, and managing the Kubernetes parameters.
Refer to the design doc for the details, any feedback is welcome.
Attachments
Issue Links
- incorporates
-
FLINK-16025 Service could expose blob server port mismatched with JM Container
- Closed
-
FLINK-16238 Rename Fabric8ClientTest to Fabric8FlinkKubeClient
- Closed
-
FLINK-16239 Port KubernetesSessionCliTest to the right package
- Closed
-
FLINK-16240 Port KubernetesUtilsTest to the right package
- Closed
- links to