Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.21.0, 4.0.0
-
Novice
-
Regression
Description
Reproducing:
- Configure camel kafka consumer with with "breakOnFirstError" = "true"
- Setup a topic with exactly 2 partitions
- Produce a series of records to kafka record to both partitions.
- Ensure offset is commited (I've done that with manual commit, autocommit MAY have a second bug also, check the description)
- Make a route to consume this topic. Ensure the first poll gets records from both partitions. Ensure the second-to-consume partition has some more records to fetch in the next poll.
- Trigger an error when processing exactly first record of the second-to-consume partition
Expected behavior:
- Application should consume all records from the first partition, and none from the second.
Actual behavior:
- Application should consume all records from the first partition. Some records from the second partition are skipped (the number depends on quantity consumed from the first in a single poll).
This bug was introduced in https://issues.apache.org/jira/browse/CAMEL-18350, which had fixed a major issue with breakOnFirstError, but had some edge cases.
The root cause is that lastResult variable is not cleaned between polls (and between partitions loop iterations), and might have an invalid dirty value got from the previous iteration. And it has no chance to be correctly initialized if exception happens on the first record of partition. Then forced sync commit is done to the right (new) partition but with invalid "random" (dirty) offset.
I've adjusted a project test project for CAMEL-18350 (many thanks to klease78) to demonstrate the issue and published it to github. Check the failing test in the project: https://github.com/Krivda/camel-bug-reproduction
P.S. Also, there might be a second bug related to this issue which may occur with enableAutoCommit=true : when the bug occurs, physical commit might be not made to already processed partitions, which may result in double processing. But i haven't investigated this issue further.
P.P.S - Please note, that the github project contains a very detailed description of the behavior pointing to the specific failing lines of code, that should be very helpful in investigation.
Attachments
Issue Links
- causes
-
CAMEL-20089 camel-kafka: make breakOnFirstError more flexible
- Resolved
- is related to
-
CAMEL-18760 camel-kafka - Issue using ThrottlingExceptionRoutePolicy with Kafka consumer
- Resolved
- relates to
-
CAMEL-20044 camel-kafka - On rejoining consumer group Camel can set offset incorrectly causing messages to be replayed
- Resolved