Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
upsert-kafka connector should support optional EXACTLY_ONCE delivery semantics.
upsert-kafka docs suggest that the connector handles duplicate records from AT_LEAST_ONCE. However, at least 2 reasons exist to configure the connector with EXACTLY_ONCE.
First, there might be other non-Flink topic consumers that would rather not have duplicated records.
Second, multiple upsert-kafka producers might cause keys to roll back to previous values. Consider a scenario with 2 producing jobs A and B, writing to the same topic with AT_LEAST_ONCE and a consuming job reading from the topic. Both producers write unique, monotonically increasing sequences to the same key. Job A writes x=a1,a2,a3,a4,a5… Job B writes x=b1,b2,b3,b4,b5,.... With this setup, we can have the following sequence:
- Job A produces x=a5.
- Job B produces x=b5.
- Job A produces the duplicate write x= 5.
The consuming job would observe x going to a5, then to b5, then back a5. EXACTLY_ONCE would prevent this behavior.
Attachments
Issue Links
- links to