Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26586

Streaming queries should have isolated SparkSessions and confs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2, 2.4.0
    • Fix Version/s: 2.4.1, 3.0.0
    • Component/s: SQL, Structured Streaming
    • Labels:
      None
    • Target Version/s:

      Description

      When a stream is started, the stream's config is supposed to be frozen and all batches run with the config at start time. However, due to a race condition in creating streams, updating a conf value in the active spark session immediately after starting a stream can lead to the stream getting that updated value.

       

      The problem is that when StreamingQueryManager creates a MicrobatchExecution (or ContinuousExecution), it passes in the shared spark session, and the spark session isn't cloned until StreamExecution.start() is called. DataStreamWriter.start() should not return until the SparkSession is cloned.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mukulmurthy Mukul Murthy
                Reporter:
                mukulmurthy Mukul Murthy
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: