[KAFKA-10086] Standby state isn't always re-used when transitioning to active - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.6.0
Component/s: streams
Labels:
None

Description

This ticket was initially just to write an integration test, but I escalated it to a blocker and changed the title when the integration test actually surfaced two bugs:

Offset positions were not reported for in-memory stores, so tasks with in-memory stores would never be considered as "caught up" and could not take over active processing, preventing clusters from ever achieving balance. This is a regression in 2.6
When the TaskAssignor decided to switch active processing from a former owner to a new one that had a standby, the lower-level cooperative rebalance protocol would first de-schedule the task completely, and only later would assign it to the new owner. For in-memory stores, this causes the standby state not to be re-used, and for persistent stores, it creates a window in which the cleanup thread might delete the state directory. In both cases, even though the instance previously had a standby, once it gets the active, it still had to restore the entire state from the changelog.

Attachments

Issue Links

links to

GitHub Pull Request #8818

Activity

People

Assignee:: John Roesler

Reporter:: John Roesler

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Jun/20 19:23

Updated:: 11/Jun/20 15:33

Resolved:: 11/Jun/20 14:19