[FLINK-35749] Kafka sink component will lose data when kafka cluster is unavailable for a while - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.16.2, 1.17.1, kafka-3.2.0
Fix Version/s: kafka-3.3.0
Component/s: Connectors / Kafka
Labels:
- pull-request-available

Release Note:
Fixed a race condition in the Kafka sink, which may lose data when kafka cluster is unavailable for a while

Description

As the title described, here is the procedure to reproduce the problem:
1. develop a simple flink stream job to consume from one kafka topic and sink to anthoer kafka sever and topic
2. make amount of kafka message and produce to the source kafka topic, record the message number
3. start the flink stream job, and config to cosume from earliest source topic offset
4. during the job cosuming the source topic, restart the kafka cluster(we use aws MSK)
5. the flink job will not throw any Exception like nothing happened, but only print error log like : [kafka-producer-network-thread | producer-2] INFO org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-2] Node 2 disconnected.
6. wait for the kafka cluster finished restarting and all the source kafka message consumed
7. count the target kafka topic message number, compare to the source, there is a high probability of data loss（more than 50%）

Attachments

Issue Links

relates to

FLINK-33545 KafkaSink implementation can cause dataloss during broker issue when not using EXACTLY_ONCE if there's any batching

Open

links to

GitHub Pull Request #107

Activity

People

Assignee:: Jimmy Zhao

Reporter:: Jimmy Zhao

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 03/Jul/24 07:39

Updated:: 11/Oct/24 11:24

Resolved:: 11/Jul/24 11:41