[SPARK-23565] Improved error message for when the number of sources for a query changes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: Structured Streaming
Labels:
None

Description

If you change the number of sources for a Structured Streaming query then you will get an assertion error as the number of sources in the checkpoint does not match the number of sources in the query that is starting. This can happen if, for example, you add a union to the input of the query. This is of course correct but the error is a bit cryptic and requires investigation.

Suggestion for a more informative error message =>

The number of sources for this query has changed. There are [x] sources in the checkpoint offsets and now there are [y] sources requested by the query. Cannot continue.

This is the current message.

02-03-2018 13:14:22 ERROR StreamExecution:91 - Query ORPositionsState to Kafka [id = 35f71e63-dbd0-49e9-98b2-a4c72a7da80e, runId = d4439aca-549c-4ef6-872e-29fbfde1df78] terminated with error java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamProgress(OffsetSeq.scala:38) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:429) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:297) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)

Attachments

Issue Links

links to

[Github] Pull Request #20946 (patrickmcgloin)

Activity

People

Assignee:: Patrick McGloin

Reporter:: Patrick McGloin

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Mar/18 13:38

Updated:: 27/Apr/18 15:08

Resolved:: 27/Apr/18 15:05