Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29950

Deleted excess executors can connect back to driver in K8S with dyn alloc on

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Kubernetes, Spark Core
    • None

    Description

      ExecutorPodsAllocator currently has code to delete excess pods that the K8S server hasn't started yet, and aren't needed anymore due to downscaling.

      The problem is that there is a race between K8S starting the pod and the Spark code deleting it. This may cause the pod to connect back to Spark and do a lot of initialization, sometimes even being considered for task allocation, just to be killed almost immediately.

      This doesn't cause any problems that I could detect in my tests, but wastes resources, and causes logs to contains misleading messages about the executor being killed. It would be nice to avoid that.

      Attachments

        Issue Links

          Activity

            People

              vanzin Marcelo Masiero Vanzin
              vanzin Marcelo Masiero Vanzin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: