Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23565

Improved error message for when the number of sources for a query changes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • Structured Streaming
    • None

    Description

      If you change the number of sources for a Structured Streaming query then you will get an assertion error as the number of sources in the checkpoint does not match the number of sources in the query that is starting.  This can happen if, for example, you add a union to the input of the query.  This is of course correct but the error is a bit cryptic and requires investigation.

      Suggestion for a more informative error message =>

      The number of sources for this query has changed.  There are [x] sources in the checkpoint offsets and now there are [y] sources requested by the query.  Cannot continue.

      This is the current message.

      02-03-2018 13:14:22 ERROR StreamExecution:91 - Query ORPositionsState to Kafka [id = 35f71e63-dbd0-49e9-98b2-a4c72a7da80e, runId = d4439aca-549c-4ef6-872e-29fbfde1df78] terminated with error java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamProgress(OffsetSeq.scala:38) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:429) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:297) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)

      Attachments

        Activity

          People

            patrickmcgloin Patrick McGloin
            patrickmcgloin Patrick McGloin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: