[KAFKA-5152] Kafka Streams keeps restoring state after shutdown is initiated during startup - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.10.2.1
Fix Version/s: 0.11.0.1, 1.0.0
Component/s: streams
Labels:
None

Description

If streams shutdown is initiated during state restore (e.g. an uncaught exception is thrown) streams will not shut down until all stores are first finished restoring.

As restore progresses, stream threads appear to be taken out of service as part of the shutdown sequence, causing rebalancing of tasks. This compounds the problem by slowing down the restore process even further, since the remaining threads now have to also restore the reassigned tasks before they can shut down.

A more severe issue is that if there is a new rebalance triggered during the end of the waitingSync phase (e.g. due to a new member joining the group, or some members timed out the SyncGroup response), then some consumer clients of the group may already proceed with the onPartitionsAssigned and blocked on trying to grab the file dir lock not yet released from other clients, while the other clients holding the lock are consistently re-sending JoinGroup requests while the rebalance cannot be completed because the clients blocked on the file dir lock will not be kicked out of the group as its heartbeat thread has been consistently sending HBRequest. Hence this is a deadlock caused by not releasing the file dir locks in task suspension.

Attachments

Issue Links

contains

KAFKA-3826 Sampling on throughput / latency metrics recording in Streams

Resolved

is depended upon by

KAFKA-5545 Kafka Streams not able to successfully restart over new broker ip

Resolved

is related to

KAFKA-5242 add max_number _of_retries to exponential backoff strategy

Resolved

links to

GitHub Pull Request #3607

GitHub Pull Request #3653

GitHub Pull Request #3675

(1 links to)

Activity

People

Assignee:: Damian Guy

Reporter:: Xavier Léauté

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 02/May/17 04:58

Updated:: 01/Mar/19 22:42

Resolved:: 22/Aug/17 18:13