Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
trunk
-
None
Description
Scenario
If a hook encounters messages that have size larger than what Kafka can handle, it has either compresses or splits or does both to break down the message in a size that Kafka can handle.
When Atlas encounters such a message as part of processing messages from the hook, it uses appropriate strategy to get the message back in the correct format.
When a message of this type is processed, there is a possibility that the processing will go on for over the threshold mandated by Kafka for commit. If the processing exceeds the threshold, Kafka will resend that message.
This causes the message to be reprocessed.
Given this, it is possible that the message may be stuck in the queue forever or at the very least, it is re-processed several times (at least twice).
Solution
- Record the message Ids for large messages.
- For messages with no version number, calculate MD5 hash of the message and use that as message id.
- If a message with same Id is encountered again, commit the same, without processing.