Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27196

Beginning offset 115204574 is after the ending offset 115204516 for topic

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: Spark Submit
    • Labels:
      None
    • Environment:

      Spark : 2.3.0

      Sparks Kafka: spark-streaming-kafka-0-10_2.3.0

      Kafka Client: org.apache.kafka.kafka-clients: 0.11.0.1

      Description

      We are getting this issue in production and Sparks consumer dying because of Off Set issue.

      We observed the following error in Kafka Broker

      ------------------------------------------------------------------

      [2019-03-18 14:40:14,100] WARN Unable to reconnect to ZooKeeper service, session 0x1692e9ff4410004 has expired (org.apache.zookeeper.ClientCnxn)
      [2019-03-18 14:40:14,100] INFO Unable to reconnect to ZooKeeper service, session 0x1692e9ff4410004 has expired, closing socket connection (org.apache.zook
      eeper.ClientCnxn)

      -----------------------------------------------------------------------------------

       

      Sparks Job died with the following error:

      ERROR 2019-03-18 07:40:57,178 7924 org.apache.spark.executor.Executor [Executor task launch worker for task 16] Exception in task 27.0 in stage 0.0 (TID 16)
      java.lang.AssertionError: assertion failed: Beginning offset 115204574 is after the ending offset 115204516 for topic <topic_name> partition 37. You either provided an invalid fromOffset, or the Kafka topic has been damaged
      at scala.Predef$.assert(Predef.scala:170)
      at org.apache.spark.streaming.kafka010.KafkaRDD.compute(KafkaRDD.scala:175)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      at org.apache.spark.scheduler.Task.run(Task.scala:109)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ptalakanti Prasanna Talakanti
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: