Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19517

KafkaSource fails to initialize partition offsets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.1.0
    • 2.1.1, 2.2.0
    • Structured Streaming
    • None

    Description

      A Kafka source with many partitions can cause the check-pointing logic to fail on restart. I got the following exception when trying to restart a Structured Streaming app that reads from a Kafka topic with hundred partitions.

      17/02/08 15:10:09 ERROR StreamExecution: Query [id = 24e2a21a-4545-4a3e-80ea-bbe777d883ab, runId = 025609c9-d59c-4de3-88b3-5d5f7eda4a66] terminated with error
      java.lang.IllegalArgumentException: Expected e.g. {"topicA":{"0":23,"1":-1},"topicB":{"0":-2}}, got {"telemetry":{"92":302854
      	at org.apache.spark.sql.kafka010.JsonUtils$.partitionOffsets(JsonUtils.scala:74)
      	at org.apache.spark.sql.kafka010.KafkaSourceOffset$.apply(KafkaSourceOffset.scala:59)
      	at org.apache.spark.sql.kafka010.KafkaSource$$anon$1.deserialize(KafkaSource.scala:134)
      	at org.apache.spark.sql.kafka010.KafkaSource$$anon$1.deserialize(KafkaSource.scala:123)
      	at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:237)
      	at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets$lzycompute(KafkaSource.scala:138)
      	at org.apache.spark.sql.kafka010.KafkaSource.initialPartitionOffsets(KafkaSource.scala:121)
                 …
      

      Attachments

        Activity

          People

            vitillo Roberto Agostino Vitillo
            vitillo Roberto Agostino Vitillo
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: