[KAFKA-12742] 5. Checkpoint all uncorrupted state stores within the subtopology - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: streams
Labels:
None

Description

Once we have KAFKA-12740, we can close the loop on EOS by checkpointing not only those state stores which are attached to processors that the record has successfully passed, but also any remaining state stores further downstream in the subtopology that aren't connected to the processor where the error occurred.

At this point, outside of a hard crash (eg process is killed) or dropping out of the group, we’ll only ever need to restore state stores from scratch if the exception came from the specific processor node they’re attached to. Which is pretty darn cool.

Note: we may need to first do some follow-up work to KAFKA-12740, depending on where we land on the open question in that ticket: whether to just disable the partial-topology commit for EOS or fully implement the logic to only perform the partial-commit iff the task remains assigned to that same client. If we end up just doing the former in KAFKA-12740 then we'll need to implement the latter before enabling this for EOS, and as a prerequisite to the work in this ticket

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: A. Sophie Blee-Goldman

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/May/21 01:46

Updated:: 03/May/21 17:34