Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9131

failed producer metadata updates result in the unrelated error message

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.5.0
    • Component/s: streams
    • Labels:
      None

      Description

      Producer Metadata TimeoutException is processed as a generic RetriableException in RecordCollectorImpl.sendError. This results in an irrelevant error message.

      We were supposed to see this

      "Timeout exception caught when sending record to topic %s. " +
      "This might happen if the producer cannot send data to the Kafka cluster and thus, " +
      "its internal buffer fills up. " +
      "This can also happen if the broker is slow to respond, if the network connection to " +
      "the broker was interrupted, or if similar circumstances arise. " +
      "You can increase producer parameter `max.block.ms` to increase this timeout."

      but got this:

      "You can increase the producer configs `delivery.timeout.ms` and/or " +
      "`retries` to avoid this error. Note that `retries` is set to infinite by default."

      These params are not applicable to metadata updates.

      Technical details:

      (1) Lines 221 - 236 in kafka/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java
      are dead code. They are never executed because producer.send never throws TimeoutException, but returns a failed future. You can see it in lines 948-955 in kafka/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

      (2) The exception is then processed in a callback function in the method recordSendError on line 202. The DefaultProductionExceptionHandler is used.

      (3) in recordSendError in the same class the timeout exception is processed as RetriableException at lines 133-136. The error message is simply wrong because tweaking  delivery.timeout.ms and retries has nothing to do with the issue in this case.

      Proposed solution:

      (1) Remove unreachable catch (final TimeoutException e) in RecordCollectorImpl.java as Producer does not throw ApiExceptions.

      (2) Move the aforementioned catch clause to recordSendError method.

      (3) Process TimeoutException separately from RetiriableException.

      (4) Implement a unit test to cover this corner case

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                gkomissarov Gleb Komissarov
                Reporter:
                gkomissarov Gleb Komissarov
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: