When processing in ALOS, we might as well commit progress made by other tasks when some task encounters specific exception. If one task has an issue and we have already successfully completed processing on at least one task it would be good to commit those successfully processed tasks. This should prevent limit the duplicated records downstream and also be more efficient.
Also if one task is having lots of issues the other tasks can at least make progress. When we introduced the thread replacement mechanism this optimization became possible.
We only enabled this for the experimental feature modular topologies. because we are worried about over committing so taking advantage of the task back off policy in the modular topologies we can avoid this issue
KAFKA-13681 Sink event duplicates for partition-stuck stream application
- links to