Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-48327

Concurrent Spark jobs execution on K8s cluster intermittently throws 'configmaps already exists' error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • Kubernetes
    • None

    Description

      We have multiple iterations where, in each iteration, we are submitting 120 concurrent Spark jobs on a Kubernetes cluster (1.20 version). In one such iteration, 2 spark jobs failed with "Message: configmaps "spark-exec-2cf3698dc8c8226d-conf-map" already exists." error:

       

      2024-02-20 23:09:43Z ERROR SparkContext - Error initializing SparkContext.
      io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc/api/v1/namespaces/default/configmaps. Message: configmaps "spark-exec-2cf3698dc8c8226d-conf-map" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=configmaps, name=spark-exec-2cf3698dc8c8226d-conf-map, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=configmaps "spark-exec-2cf3698dc8c8226d-conf-map" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682) ~[kubernetes-client-5.12.2.jar:?]
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661) ~[kubernetes-client-5.12.2.jar:?]
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) ~[kubernetes-client-5.12.2.jar:?]
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555) ~[kubernetes-client-5.12.2.jar:?]
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518) ~[kubernetes-client-5.12.2.jar:?]
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:305) ~[kubernetes-client-5.12.2.jar:?]
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644) ~[kubernetes-client-5.12.2.jar:?]
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:83) ~[kubernetes-client-5.12.2.jar:?]
          at io.fabric8.kubernetes.client.dsl.base.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:61) ~[kubernetes-client-5.12.2.jar:?]
          at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.setUpExecutorConfigMap(KubernetesClusterSchedulerBackend.scala:110) ~[spark-kubernetes_2.12-3.3.1.jar:3.3.1]
          at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.start(KubernetesClusterSchedulerBackend.scala:139) ~[spark-kubernetes_2.12-3.3.1.jar:3.3.1]
          at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:222) ~[spark-core_2.12-3.3.1.jar:3.3.1]
          at org.apache.spark.SparkContext.<init>(SparkContext.scala:585) ~[spark-core_2.12-3.3.1.jar:3.3.1]
          at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) ~[spark-core_2.12-3.3.1.jar:3.3.1]
          at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
          at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
          at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) ~[spark-sql_2.12-3.3.1.jar:3.3.1] 

      I found 2 somewhat similar issues raised SPARK-41006 and SPARK-39115 where a similar error was seen for Spark driver configmap.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            praneetsharma Praneet Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: