Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-31408

Add EXACTLY_ONCE support to upsert-kafka

    XMLWordPrintableJSON

Details

    Description

      upsert-kafka connector should support optional EXACTLY_ONCE delivery semantics.

      upsert-kafka docs suggest that the connector handles duplicate records from AT_LEAST_ONCE. However, at least 2 reasons exist to configure the connector with EXACTLY_ONCE.

      First, there might be other non-Flink topic consumers that would rather not have duplicated records.

      Second, multiple upsert-kafka producers might cause keys to roll back to previous values. Consider a scenario with 2 producing jobs A and B, writing to the same topic with AT_LEAST_ONCE and a consuming job reading from the topic. Both producers write unique, monotonically increasing sequences to the same key. Job A writes x=a1,a2,a3,a4,a5… Job B writes x=b1,b2,b3,b4,b5,.... With this setup, we can have the following sequence:

      1. Job A produces x=a5.
      2. Job B produces x=b5.
      3. Job A produces the duplicate write x= 5.

      The consuming job would observe x going to a5, then to b5, then back a5. EXACTLY_ONCE would prevent this behavior.

      Attachments

        Issue Links

          Activity

            People

              Gerrrr Alex Sorokoumov
              Gerrrr Alex Sorokoumov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: