[KAFKA-10167] Streams EOS-Beta should not try to get end-offsets as read-committed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.6.0
Component/s: streams
Labels:
None

Description

This is a bug discovered with the new EOS protocol (KIP-447), here's the context:

In Streams when we are assigned with the new active tasks, we would first try to restore the state from the changelog topic all the way to the log end offset, and then we can transit from the `restoring` to the `running` state to start processing the task.

Before KIP-447, the end-offset call is only triggered after we've passed the synchronization barrier at the txn-coordinator which would guarantee that the txn-marker has been sent and received (otherwise we would error with CONCURRENT_TRANSACTIONS and let the producer retry), and when the txn-marker is received, it also means that the marker has been fully replicated, which in turn guarantees that the data written before that marker has been fully replicated. As a result, when we send the list-offset with `read-committed` flag we are guaranteed that the returned offset == LSO == high-watermark.

After KIP-447 however, we do not fence on the txn-coordinator but on group-coordinator upon offset-fetch, and the group-coordinator would return the fetching offset right after it has received the replicated the txn-marker sent to it. However, since the txn-marker are sent to different brokers in parallel, and even within the same broker markers of different partitions are appended / replicated independently as well, so when the fetch-offset request returns it is NOT guaranteed that the LSO on other data partitions would have been advanced as well. And hence in that case the `endOffset` call may returned a smaller offset, causing data loss.

Attachments

Issue Links

causes

KAFKA-10148 Kafka Streams Restores too few Records with eos-beta Enabled

Closed

links to

GitHub Pull Request #8876

Activity

People

Assignee:: Guozhang Wang

Reporter:: Guozhang Wang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Jun/20 00:20

Updated:: 18/Jun/20 23:13

Resolved:: 18/Jun/20 23:13