Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30745

Spark streaming, kafka broker error, "Failed to get records for spark-executor- .... after polling for 512"

    XMLWordPrintableJSON

    Details

      Description

      We have a spark streaming application reading data from Kafka.
      Data size: 15 Million

      Below errors were seen:
      java.lang.AssertionError: assertion failed: Failed to get records for spark-executor- .... after polling for 512 at scala.Predef$.assert(Predef.scala:170)

      There were more errors seen pertaining to CachedKafkaConsumer
      at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
      at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
      at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
      at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214)
      at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
      at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
      at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
      at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926) 
      at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670) 
      at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
      at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) 
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      at org.apache.spark.scheduler.Task.run(Task.scala:86)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

       

      The spark.streaming.kafka.consumer.poll.ms is set to default 512ms and other Kafka stream timeout settings are default.
      "request.timeout.ms" 
      "heartbeat.interval.ms" 
      "session.timeout.ms" 
      "max.poll.interval.ms" 

      Also, the kafka is being recently updated to 0.10 from 0.8. In 0.8, this behavior was not seen.
      No resource issues are seen.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              HKalsi Harneet K
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: