Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10228

producer: NETWORK_EXCEPTION is thrown instead of a request timeout

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3.1
    • None
    • clients
    • None

    Description

      We're currently seeing an issue with the java client (producer), when message producing runs into a timeout. Namely a NETWORK_EXCEPTION is thrown instead of a timeout exception.

      Situation and relevant code:

      Config

      request.timeout.ms: 200
      retries: 3
      acks: all
      for (UnpublishedEvent event : unpublishedEvents) {
          ListenableFuture<SendResult<String, String>> future;
          future = kafkaTemplate.send(new ProducerRecord<>(event.getTopic(), event.getKafkaKey(), event.getPayload()));
          futures.add(future.completable());
      }
      
      CompletableFuture.allOf(futures.stream().toArray(CompletableFuture[]::new)).join();

      We're using the KafkaTemplate from SpringBoot here, but it shouldn't matter, as it's merely a wrapper. There we put in batches of messages to be sent.

      200ms later, we can see the following in the logs: (not sure about the order, they've arrived in the same ms, so our logging system might not display them in the right order)

      [Producer clientId=producer-1] Received invalid metadata error in produce request on partition events-6 due to org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.. Going to request metadata update now
      [Producer clientId=producer-1] Got error produce response with correlation id 3094 on topic-partition events-6, retrying (2 attempts left). Error: NETWORK_EXCEPTION 

      There is also a corresponding error on the broker (within a few ms):

      Attempting to send response via channel for which there is no open connection, connection id XXX (kafka.network.Processor) 

      This was somewhat unexpected and sent us for a hunt across the infrastructure for possible connection issues, but we've found none.

      Side note: In some cases the retries worked and the messages were successfully produced.

      Only after many hours of heavy debugging, we've noticed, that the error might be related to the low timeout setting. We've removed that setting now, as it was a remnant from the past and no longer valid for our use-case. However in order to avoid other people having that issue again and to simplify future debugging, some form of timeout exception should be thrown.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tgbeck Christian Becker
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: