[SPARK-42485] SPIP: Shutting down spark structured streaming when the streaming process completed current process - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.2.2
Fix Version/s: None
Component/s: Structured Streaming
Labels:
- SPIP

Description

Spark Structured Streaming is a very useful tool in dealing with Event Driven Architecture. In an Event Driven Architecture, there is generally a main loop that listens for events and then triggers a call-back function when one of those events is detected. In a streaming application the application waits to receive the source messages in a set interval or whenever they happen and reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. Gracefully meaning that Spark application handles the last streaming message completely and terminates the application. This is different from invoking interrupts such as CTRL-C.

Of course one can terminate the process based on the following

query.awaitTermination() # Waits for the termination of this query, with stop() or with error

query.awaitTermination(timeoutMs) # Returns true if this query is terminated within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second one will count the timeout and will exit when timeout in milliseconds is reached.

The issue is that one needs to predict how long the streaming job needs to run. Clearly any interrupt at the terminal or OS level (kill process), may end up the processing terminated without a proper completion of the streaming process.

I have devised a method that allows one to terminate the spark application internally after processing the last received message. Within say 2 seconds of the confirmation of shutdown, the process will invoke a graceful shutdown.

This new feature proposes a solution to handle the topic doing work for the message being processed gracefully, wait for it to complete and shutdown the streaming process for a given topic without loss of data or orphaned transactions