Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30745

Spark streaming, kafka broker error, "Failed to get records for spark-executor- .... after polling for 512"

    XMLWordPrintableJSON

Details

    Description

      We have a spark streaming application reading data from Kafka.
      Data size: 15 Million

      Below errors were seen:
      java.lang.AssertionError: assertion failed: Failed to get records for spark-executor- .... after polling for 512 at scala.Predef$.assert(Predef.scala:170)

      There were more errors seen pertaining to CachedKafkaConsumer
      at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
      at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
      at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
      at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214)
      at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
      at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
      at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
      at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926) 
      at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670) 
      at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
      at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) 
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      at org.apache.spark.scheduler.Task.run(Task.scala:86)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

       

      The spark.streaming.kafka.consumer.poll.ms is set to default 512ms and other Kafka stream timeout settings are default.
      "request.timeout.ms" 
      "heartbeat.interval.ms" 
      "session.timeout.ms" 
      "max.poll.interval.ms" 

      Also, the kafka is being recently updated to 0.10 from 0.8. In 0.8, this behavior was not seen.
      No resource issues are seen.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            HKalsi Harneet K
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: