Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-14417

Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest, treats as fatal error

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.1.0, 3.0.0, 3.2.0, 3.3.0, 3.4.0
    • 4.0.0, 3.3.2
    • None
    • None

    Description

      In TransactionManager we have a handler for InitProducerIdRequests https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#LL1276C14-L1276C14

      However, we have the potential to return a REQUEST_TIMED_OUT error in RPCProducerIdManager when the BrokerToControllerChannel manager times out: https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L236 

      or when the poll returns null: https://github.com/apache/kafka/blob/19286449ee20f85cc81860e13df14467d4ce287c/core/src/main/scala/kafka/coordinator/transaction/ProducerIdManager.scala#L170

      Since REQUEST_TIMED_OUT is not handled by the producer, we treat it as a fatal error and the producer fails. With the default of idempotent producers, this can cause more issues.
      See this stack trace from 3.0:

      ERROR [Producer clientId=console-producer] Aborting producer batches due to fatal error (org.apache.kafka.clients.producer.internals.Sender)
      org.apache.kafka.common.KafkaException: Unexpected error in InitProducerIdResponse; The request timed out.
      	at org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1390)
      	at org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1294)
      	at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
      	at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:658)
      	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:650)
      	at org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:418)
      	at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316)
      	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:256)
      	at java.lang.Thread.run(Thread.java:748)
      

      Seems like the commit that introduced the changes was this one: https://github.com/apache/kafka/commit/72d108274c98dca44514007254552481c731c958 so we are vulnerable when the server code is ibp 3.0 and beyond.
       

      Attachments

        Activity

          People

            jolshan Justine Olshan
            jolshan Justine Olshan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: