Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.4.5
-
None
Description
The Spark DStream API uses spark-streaming-kafka-0-10 to talk to Kafka. The method in the connector code that's responsible for committing offset, commitAll, calls commitAsync in Kafka client to commit the offsets. commitAsync tries to find the group coordinator and sends the commits in case of success, or throws a RetriableCommitFailedException in case of failure and doesn't retry. This behavior was introduced in KAFKA-4034. The reason for not attempting retry was written there as: "we don't want recursive retries which can cause offset commits to arrive out of order". From the Spark side though, we should be able to retry when running into a RetriableException. The issue of potentially committing offsets out of order can be addressed by keeping a monotonically increasing sequence number every time a commit happens and including this number in the callback function of commitAsync.