Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47253

Allow LiveEventBus to stop without the completely draining of event queue




      #Problem statement:
      The SparkContext.stop() hung a long time on LiveEventBus.stop() when listeners slow

      #User scenarios:
      We have a centralized service with multiple instances to regularly execute user's scheduled tasks.
      For each user task within one service instance, the process is as follows:

      1.Create a Spark session directly within the service process with an account defined in the task.
      2.Instantiate listeners by class names and register them with the SparkContext. The JARs containing the listener classes are uploaded to the service by the user.
      3.Prepare resources.
      4.Run user logic (Spark SQL).
      5.Stop the Spark session by invoking SparkSession.stop().

      In step 5, it will wait for the LiveEventBus to stop, which requires the remaining events to be completely drained by each listener.

      Since the listener is implemented by users and we cannot prevent some heavy stuffs within the listener on each event, there are cases where a single heavy job has over 30,000 tasks,
      and it could take 30 minutes for the listener to process all the remaining events, because within the listener, it requires a coarse-grained global lock and update the internal status to the remote database.

      This kind of delay affects other user tasks in the queue. Therefore, from the server side perspective, we need the guarantee that the stop operation finishes quickly.


        Issue Links



              takawaakirayo TakawaAkirayo
              takawaakirayo TakawaAkirayo
              0 Vote for this issue
              2 Start watching this issue