[FLINK-21191] Support reducing buffer for upsert-kafka sink - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.13.0
Component/s: Connectors / Kafka, Table SQL / Ecosystem
Labels:
- pull-request-available

Description

Currently, if there is a job agg -> filter -> upsert-kafka, then upsert-kafka will receive -U and +U for every updates instead of only a +U. This will produce a lot of tombstone messages in Kafka. It's not just about the unnecessary data volume in Kafka, but users may processes that trigger side effects when a tombstone records is ingested from a Kafka topic.

A simple solution would be add a reducing buffer for the upsert-kafka, to reduce the -U and +U before emitting to the underlying sink. This should be very similar to the implementation of upsert JDBC sink.

We can even extract the reducing logic out of the JDBC connector and it can be reused by other connectors.
This should be something like `BufferedUpsertSinkFunction` which has a reducing buffer and flush to the underlying SinkFunction
once checkpointing or buffer timeout. We can put it in `flink-connector-base` which can be shared for builtin connectors and custom connectors.

Attachments

Issue Links

links to

GitHub Pull Request #15434

Activity

People

Assignee:: Shengkai Fang

Reporter:: Jark Wu

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 28/Jan/21 11:44

Updated:: 28/Aug/21 11:12

Resolved:: 01/Apr/21 02:22