Description
KAFKA-14402 (KIP-890) adds verification with the transaction coordinator on Produce and TxnOffsetCommit paths as a defense against hanging transactions. For compatibility with older clients, retriable errors from the verification step are translated to ones already expected and handled by existing clients. When verification was added, we forgot to translate NETWORK_EXCEPTION s.
dajac noticed this manifesting as a test failure when tests/kafkatest/tests/core/transactions_test.py was run with an older client (prior to the fix for KAFKA-16122):
NETWORK_EXCEPTION is indeed returned as a partition error. The TransactionManager.TxnOffsetCommitHandler considers it as a fatal error so it transitions to the fatal state.
It seems that there are two cases where the server could return it: (1) When the verification request times out or its connections is cut; or (2) in AddPartitionsToTxnManager.addTxnData where we say that we use it because we want a retriable error.
The first case was triggered as part of the test. The second case happens when there is already a verification request (AddPartitionsToTxn) in flight with the same epoch and we want clients to try again when we're not busy.