[FLINK-13537] Changing Kafka producer pool size and scaling out may create overlapping transaction IDs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Not a Priority
Resolution: Unresolved
Affects Version/s: 1.8.1, 1.9.0
Fix Version/s: None
Component/s: Connectors / Kafka
Labels:
- auto-deprioritized-major
- auto-deprioritized-minor

Description

The Kafka producer's transaction IDs are only generated once when there was no previous state for that operator. In the case where we restore and increase parallelism (scale-out), some operators may not have previous state and create new IDs. Now, if we also reduce the poolSize, these new IDs may overlap with the old ones which should never happen! Similarly, a scale-in + increasing poolSize could lead the the same thing.

An easy "fix" for this would be to forbid changing the poolSize. We could potentially be a bit better by only forbidding changes that can lead to transaction ID overlaps which we can identify from the formulae that TransactionalIdsGenerator uses. This should probably be the first step which can also be back-ported to older Flink versions just in case.

On a side note, the current scheme also relies on the fact, that the operator's list state distributes previous states during scale-out in a fashion that only the operators with the highest subtask indices do not get a previous state. This is somewhat "guaranteed" by OperatorStateStore#getListState() but I'm not sure whether we should actually rely on that there.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nico Kruber

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 01/Aug/19 13:34

Updated:: 23/Nov/21 22:38