This has been revealed by the system test failures on jenkins.
The transaction coordinator seems to get into a path during the handling of the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or COORDINATOR_NOT_AVAILABLE error, to be revealed by https://github.com/apache/kafka/pull/3278) to the client. However, due to network instability, the producer is disconnected before it receives this error.
As a result, the transaction remains in a `PrepareXX` state, and future `EndTxn` requests sent by the client after reconnecting result in a `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the transaction never finishes, as expiration isn't done from a PrepareXX state.