[FLINK-6006] Kafka Consumer can lose state if queried partition list is incomplete on restore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.5, 1.2.1
Component/s: Connectors / Common, Connectors / Kafka
Labels:
None

Description

In 1.1.x and 1.2.x, the FlinkKafkaConsumer performs partition list querying on restore. Then, only restored state of partitions that exists in the queried list is used to initialize the fetcher's state holders.

If in any case the returned partition list is incomplete (i.e. missing partitions that existed before, perhaps due to temporary ZK / broker downtime), then the state of the missing partitions is dropped and cannot be recovered anymore.

In 1.3-SNAPSHOT, this is fixed by changes in ~~FLINK-4280~~, so only 1.1 and 1.2 is affected.

We can backport some of the behavioural changes there to 1.1 and 1.2. Generally, we should not depend on the current partition list in Kafka when restoring, but just restore all previous state into the fetcher's state holders.

This would therefore also require some checking on how the consumer threads / Kafka clients behave when its assigned partitions cannot be reached.

Attachments

Issue Links

links to

GitHub Pull Request #3505

GitHub Pull Request #3507

Activity

People

Assignee:: Tzu-Li (Gordon) Tai

Reporter:: Tzu-Li (Gordon) Tai

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 09/Mar/17 12:16

Updated:: 24/Mar/17 08:46

Resolved:: 15/Mar/17 14:39