Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21191

Support reducing buffer for upsert-kafka sink

    XMLWordPrintableJSON

    Details

      Description

      Currently, if there is a job agg -> filter -> upsert-kafka, then upsert-kafka will receive -U and +U for every updates instead of only a +U. This will produce a lot of tombstone messages in Kafka. It's not just about the unnecessary data volume in Kafka, but users may processes that trigger side effects when a tombstone records is ingested from a Kafka topic.

      A simple solution would be add a reducing buffer for the upsert-kafka, to reduce the -U and +U before emitting to the underlying sink. This should be very similar to the implementation of upsert JDBC sink.

      We can even extract the reducing logic out of the JDBC connector and it can be reused by other connectors.
      This should be something like `BufferedUpsertSinkFunction` which has a reducing buffer and flush to the underlying SinkFunction
      once checkpointing or buffer timeout. We can put it in `flink-connector-base` which can be shared for builtin connectors and custom connectors.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                fsk119 Shengkai Fang
                Reporter:
                jark Jark Wu
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: