Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.0.0
-
None
-
None
Description
Hi team, I'm using a transactional producer and set request.timeout.ms to a rather small value such as 10s, meanwhile set zookeeper.session.timeout.ms longer such as 30s.
When the producer sending records and one broker accidentally shut down, I notice the producer throw out 'org.apache.kafka.common.KafkaException: The client hasn't received acknowledgment for some previously sent messages and can no longer retry them. It isn't safe to continue' and exit.
Looking into the code, I found that when a batch expired in RecordAccumulator, it will be marked as unsolved in Sender#sendProducerData. And if it's a transactional process, it will be doomed to transitionToFatalError later.
I'm wondering why we need to transitionToFatalError here? Is it better to abort this transaction instead? I know it's necessary to bump the epoch during the idempotence sending, but why we let the producer crash in this case?
I found that KAFKA-8805; Bump producer epoch on recoverable errors (#7389) fix this by automatically bumping the producer epoch after aborting the transaction, but why it's necessary to bump the epoch, what problem will occur if we call transitionToAbortableError directly and let the user abort it?
Attachments
Issue Links
- links to