Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2135

New Kafka Producer Client: Send requests wait indefinitely if no broker is available.



    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s:
    • Fix Version/s: None
    • Component/s: producer
    • Labels:


      I'm seeing issues when sending a message with the new producer client API. The future returned from Producer.send() will block indefinitely if the cluster is unreachable for some reason. Here are the steps:

      1. Start up a single node kafka cluster locally.
      2. Start up application and create a KafkaProducer with the following config:
        KafkaProducerWrapper values: 
        	compression.type = snappy
        	metric.reporters = []
        	metadata.max.age.ms = 300000
        	metadata.fetch.timeout.ms = 60000
        	acks = all
        	batch.size = 16384
        	reconnect.backoff.ms = 10
        	bootstrap.servers = [localhost:9092]
        	receive.buffer.bytes = 32768
        	retry.backoff.ms = 100
        	buffer.memory = 33554432
        	timeout.ms = 30000
        	key.serializer = class com.mycompany.kafka.serializer.ToStringEncoder
        	retries = 3
        	max.request.size = 1048576
        	block.on.buffer.full = true
        	value.serializer = class com.mycompany.kafka.serializer.JsonEncoder
        	metrics.sample.window.ms = 30000
        	send.buffer.bytes = 131072
        	max.in.flight.requests.per.connection = 5
        	metrics.num.samples = 2
        	linger.ms = 0
        	client.id = site-json
      3. Send some messages...they are successfully sent
      4. Shut down the kafka broker
      5. Send another message.

      At this point, calling get() on the returned Future will block indefinitely until the broker is restarted.

      It appears that there is some logic in org.apache.kafka.clients.producer.internal.Sender that is supposed to mark the Future as "done" in response to a disconnect event (towards the end of the run(long) method). However, the while loop earlier in this method seems to remove the broker from consideration entirely, so the final loop over ClientResponse objects is never executed.

      It seems like "timeout.ms" configuration should be honored in this case, or perhaps introduce another timeout, indicating that we should give up waiting for the cluster to return.


          Issue Links



              • Assignee:
                junrao Jun Rao
                dhay David Hay
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created: