Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0.0
-
None
-
None
Description
Hi,
I found that the Spark Connect session will not be released if not calling spark.stop() explicitly.
Repro:
I have a python file with below code
test.py
from pyspark.sql import SparkSession spark = SparkSession.builder.remote("sc://localhost").getOrCreate() spark.range(10).show()
After executing it by
python test.py
I found the corresponding connect session is still alive from spark webui. See this session id 96260131-a22c-4342-92df-8dc7ace5d1de item in below table
But if I have `spark.stop() been called explicitly in the python file
spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
spark.range(10).show()
spark.stop()
The connect session will be released. See the 4e8ffb4e-7684-4fa7-b750-814f9a23f2d0 item
User | Session ID | Start Time ▾ | Finish Time | Duration | Total Execute |
---|---|---|---|---|---|
xxx | 4e8ffb4e-7684-4fa7-b750-814f9a23f2d0 | 2024/10/30 11:05:25 | 2024/10/30 11:05:25 | 78 ms | 1 |
xxx | 96260131-a22c-4342-92df-8dc7ace5d1de | 2024/10/30 11:04:41 | 13 minutes 55 seconds | 0 |
So I'm wondering if this is per-design or a potential bug? Since if the connect session is not released, the connect server will
still hold the caches which will not be freed. That could blow up the connect server/driver memory.