Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.1.2, 3.3.0, 3.3.1
-
None
-
None
Description
Description of the issue:
There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers and executors). Ideally, for each job 2 config maps should be created: for a driver and an executor. However, the library creates only one driver config map for all jobs (in some cases it generates only one executor map for all jobs in the same manner). So, if I run 5 jobs, then only one driver config map will be generated and used for every job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources.
The reason of the issue:
This problem occurs because of the KubernetesClientUtils class in which we have configMapNameExecutor and configMapNameDriver as constants. It seems to be incorrect and should be urgently fixed. I've prepared some changes for review to fix the issue (tested in the cluster of our project).
Steps to reproduce the issue:
- Create a KubernetesClientApplication object.
- Submit at least 2 jobs (sequentially or using Thread for running in parallel).
The results of my observations according to the steps are as follows:
- Spark 3.1.2 - The same config map in K8S will be overwritten which means all the jobs will point to the same config map.
- Spark 3.3.* - For the first job a new config map will be created. For other jobs an exception will be thrown (the K8S Fabric library does not allow to create a new config map with the existing name).