[KAFKA-12739] 2. Commit any cleanly-processed records within a corrupted task - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: streams
Labels:
None

Description

Within a task, there will typically be a number of records that have been successfully processed through the subtopology but not yet committed. If the next record to be picked up hits an unexpected exception, we’ll dirty close the entire task and essentially throw away all the work we did on those previous records. We should be able to drop only the corrupted record and just commit the offsets up to that point.
Again, for some exceptions such as de/serialization or user code errors, this can be straightforward as the thread/task is otherwise in a healthy state. Other cases such as an error in the Producer will need to be tackled separately, since a Producer error cannot be isolated to a single task.

The challenge here will be in handling records sent to the changelog while processing the record that hits an error – we may need to buffer those records so they aren’t sent to the RecordCollector until a record has been fully processed, otherwise they will be committed and break EOS semantics (unless we can immediately implement KAFKA-12740)