Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49079

Spark jobs failing with UnknownHostException on executors if driver readiness timeout elapsed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.5.0
    • None
    • k8s
    • None
    • Running spark jobs inside EMR on EKS offering from AWS, which is using 3.5.0 under the hood

    Description

      We have seen cases where Spark jobs would fail to run in case ExecutorPodsAllocator times out while waiting for the driver pod to get to the READY status. If that happens, we have seen 2 possible scenarios leading to the same result (executors failing with an UnknownHostException trying to resolve the k8s service for spark driver and the job failing because the maximum number of executor failures was reached):

      • Kubernetes service not getting created (confirmed that with the k8s service created event/metric available in grafana)
      • Kubernetes service being there but still not being able to resolve the hostname in the executors (maybe the service being fully available only when driver pod got ready and executors tried to resolve the hostname prior to that)

      The particular part of the code under question is https://github.com/apache/spark/blob/v3.5.0/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L130 

          driverPod.foreach { pod =>
            // Wait until the driver pod is ready before starting executors, as the headless service won't
            // be resolvable by DNS until the driver pod is ready.
            Utils.tryLogNonFatalError {
              kubernetesClient
                .pods()
                .inNamespace(namespace)
                .withName(pod.getMetadata.getName)
                .waitUntilReady(driverPodReadinessTimeout, TimeUnit.SECONDS)
            }
          } 

      Interestingly enough the comment says wait until the driver pod otherwise the service will not be resolvable by executors, but we still let the run to continue.

      Also worth mentioning the documentation around such readiness timeout config (https://github.com/apache/spark/blob/v3.5.0/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala#L454)

        val KUBERNETES_ALLOCATION_DRIVER_READINESS_TIMEOUT =
          ConfigBuilder("spark.kubernetes.allocation.driver.readinessTimeout")
            .doc("Time to wait for driver pod to get ready before creating executor pods. This wait " +
              "only happens on application start. If timeout happens, executor pods will still be " +
              "created.")
            .version("3.1.3")
            .timeConf(TimeUnit.SECONDS)
            .checkValue(value => value > 0, "Allocation driver readiness timeout must be a positive "
              + "time value.")
            .createWithDefaultString("1s") 

      Please note the "If timeout happens, executor pods will still be created", which conflicts (at least in my head) with the code comment on the await we have for the driver pod.

      The question would be, is this intended behaviour? Looks like a bug, maybe we should check before creating the executors once again whether driver pod is ready and otherwise fail the job?

      For now trying to mitigate by increasing the readiness timeout value as a bandaid fix.

      Attachments

        Activity

          People

            Unassigned Unassigned
            oscartorreno Oscar Torreno
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: