Description
When running a Kafka Streams EOS app that goes slightly above a configured user quota, we can reliably reproduce `TaskCorruptedException`s after throttling. This is the case even with an application that goes only 5-10% above the configured quota.
The root cause is a `TimeoutException` encountered in the `TaskExecutor.commitOffsetsOrTransaction`.
Stacktrace provided below:
```
19:45:28 ERROR [KAFKA] TaskExecutor - stream-thread [basic-tls-0-core-StreamThread-2] Committing task(s) 1_2 failed. org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000ms while awaiting AddOffsetsToTxn 19:45:28 WARN [KAFKA] StreamThread - stream-thread [basic-tls-0-core-StreamThread-2] Detected the states of tasks [1_2] are corrupted. Will close the task as dirty and re-create and bootstrap from scratch. org.apache.kafka.streams.errors.TaskCorruptedException: Tasks [1_2] are corrupted and hence need to be re-initialized at org.apache.kafka.streams.processor.internals.TaskExecutor.commitOffsetsOrTransaction(TaskExecutor.java:249) ~[server.jar:?] at org.apache.kafka.streams.processor.internals.TaskExecutor.commitTasksAndMaybeUpdateCommittableOffsets(TaskExecutor.java:154) ~[server.jar:?] at org.apache.kafka.streams.processor.internals.TaskManager.commitTasksAndMaybeUpdateCommittableOffsets(TaskManager.java:1915) ~[server.jar:?] at org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:1882) ~[server.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1384) ~[server.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithoutProcessingThreads(StreamThread.java:1033) ~[server.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:711) [server.jar:?] at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:670) [server.jar:?]
```
Attachments
Issue Links
- links to