[SPARK-49475] Spark Driver process not terminating on Kubernetes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: k8s, Kubernetes
Labels:
None

Description

We are migrating from YARN to Kubernetes (K8s). The simple program below experiences a driver hang when running in K8s but completes successfully when running in YARN.

object TestSparkJobHanging {  
  def main(args: Array[String]): Unit = {  
    val spark = SparkSession  
      .builder()  
      .getOrCreate()  
  
    // executorService is not daemon thread  
    val executorService = Executors.newFixedThreadPool(6)  
    // submit 1 task to executor  
    val runnableTask: Runnable = new Runnable {  
      def run(): Unit = {  
        spark.read.json(args(0)).write.parquet(args(1))  
      }  
    }  
    val future = executorService.submit(runnableTask)  
    future.get()  
  }  
}

The root cause of this issue lies in the architectural differences between YARN and Kubernetes. When Spark runs in cluster mode on YARN, the driver program terminates by calling System.exit(). However, in K8s, the driver needs to wait for all non-daemon threads to exit.

Questions:

Has this issue been addressed in the latest Spark version?

If not, is this the expected behavior? Do we have plans to make the behavior consistent between YARN and Kubernetes (K8s)?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Lu Niu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Aug/24 18:56

Updated:: 30/Aug/24 18:56