Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.1.0
-
None
Description
A Kafka source with many partitions can cause the check-pointing logic to fail on restart. I got the following exception when trying to restart a Structured Streaming app that reads from a Kafka topic with hundred partitions.
17/02/08 15:10:09 ERROR StreamExecution: Query [id = 24e2a21a-4545-4a3e-80ea-bbe777d883ab, runId = 025609c9-d59c-4de3-88b3-5d5f7eda4a66] terminated with error java.lang.IllegalArgumentException: Expected e.g. {"topicA":{"0":23,"1":-1},"topicB":{"0":-2}}, got {"telemetry":{"92":302854 at org.apache.spark.sql.kafka010.JsonUtils$.partitionOffsets(JsonUtils.scala:74) at org.apache.spark.sql.kafka010.KafkaSourceOffset$.apply(KafkaSourceOffset.scala:59) at org.apache.spark.sql.kafka010.KafkaSource$$anon$1.deserialize(KafkaSource.scala:134) at org.apache.spark.sql.kafka010.KafkaSource$$anon$1.deserialize(KafkaSource.scala:123) at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:237) at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets$lzycompute(KafkaSource.scala:138) at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets(KafkaSource.scala:121) …