Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.2.0
-
None
-
None
Description
We are migrating from YARN to Kubernetes (K8s). The simple program below experiences a driver hang when running in K8s but completes successfully when running in YARN.
object TestSparkJobHanging { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .getOrCreate() // executorService is not daemon thread val executorService = Executors.newFixedThreadPool(6) // submit 1 task to executor val runnableTask: Runnable = new Runnable { def run(): Unit = { spark.read.json(args(0)).write.parquet(args(1)) } } val future = executorService.submit(runnableTask) future.get() } }
The root cause of this issue lies in the architectural differences between YARN and Kubernetes. When Spark runs in cluster mode on YARN, the driver program terminates by calling System.exit(). However, in K8s, the driver needs to wait for all non-daemon threads to exit.
Questions:
- Has this issue been addressed in the latest Spark version?
- If not, is this the expected behavior? Do we have plans to make the behavior consistent between YARN and Kubernetes (K8s)?