Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50168

Connect session is not released if not calling spark.stop() explicitly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Connect
    • None

    Description

      Hi,

      I found that the Spark Connect session will not be released if not calling spark.stop() explicitly.

      Repro:

      I have a python file with below code 

      test.py

      from pyspark.sql import SparkSession
      spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
      spark.range(10).show()

      After executing it by 

      python test.py

      I found the corresponding connect session is still alive from spark webui. See this session id 96260131-a22c-4342-92df-8dc7ace5d1de item in below table

      But if I have `spark.stop() been called explicitly in the python file

      spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
      spark.range(10).show()
      spark.stop()

      The connect session will be released. See the 4e8ffb4e-7684-4fa7-b750-814f9a23f2d0 item

       

      User Session ID Start Time ▾ Finish Time Duration Total Execute
      xxx 4e8ffb4e-7684-4fa7-b750-814f9a23f2d0 2024/10/30 11:05:25 2024/10/30 11:05:25 78 ms 1
      xxx 96260131-a22c-4342-92df-8dc7ace5d1de 2024/10/30 11:04:41   13 minutes 55 seconds 0

      So I'm wondering if this is per-design or a potential bug? Since if the connect session is not released, the connect server will
      still hold the caches which will not be freed. That could blow up the connect server/driver memory.

      Attachments

        Activity

          People

            Unassigned Unassigned
            wbo4958 Bobby Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: