Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26586

Streaming queries should have isolated SparkSessions and confs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2, 2.4.0
    • 2.4.1, 3.0.0
    • SQL, Structured Streaming
    • None

    Description

      When a stream is started, the stream's config is supposed to be frozen and all batches run with the config at start time. However, due to a race condition in creating streams, updating a conf value in the active spark session immediately after starting a stream can lead to the stream getting that updated value.

       

      The problem is that when StreamingQueryManager creates a MicrobatchExecution (or ContinuousExecution), it passes in the shared spark session, and the spark session isn't cloned until StreamExecution.start() is called. DataStreamWriter.start() should not return until the SparkSession is cloned.

      Attachments

        Issue Links

          Activity

            People

              mukulmurthy Mukul Murthy
              mukulmurthy Mukul Murthy
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: