[SPARK-26586] Streaming queries should have isolated SparkSessions and confs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2, 2.4.0
Fix Version/s: 2.4.1, 3.0.0
Component/s: SQL, Structured Streaming
Labels:
None

Target Version/s:

3.0.0

Description

When a stream is started, the stream's config is supposed to be frozen and all batches run with the config at start time. However, due to a race condition in creating streams, updating a conf value in the active spark session immediately after starting a stream can lead to the stream getting that updated value.

The problem is that when StreamingQueryManager creates a MicrobatchExecution (or ContinuousExecution), it passes in the shared spark session, and the spark session isn't cloned until StreamExecution.start() is called. DataStreamWriter.start() should not return until the SparkSession is cloned.

Attachments

Issue Links

links to

GitHub Pull Request #23513

Activity

People

Assignee:: Mukul Murthy

Reporter:: Mukul Murthy

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Jan/19 05:18

Updated:: 11/Jan/19 21:26

Resolved:: 11/Jan/19 19:47