Details
Description
I am running Kafka server locally with extremely low retention of 3 seconds and with 1 second segmentation. I create direct Kafka stream with auto.offset.reset = smallest.
In case of bad luck (happens actually quite often in my case) the smallest offset retrieved druing stream initialization doesn't already exists when streaming actually starts.
Complete source code of the Spark Streaming application is here:
https://github.com/pygmalios/spark-checkpoint-experience/blob/cb27ab83b7a29e619386b56e68a755d7bd73fc46/src/main/scala/com/pygmalios/sparkCheckpointExperience/spark/SparkApp.scala
The application ends in an endless loop trying to get that non-existing offset and has to be killed. Check attached logs from Spark and also from Kafka server.