[KAFKA-2135] New Kafka Producer Client: Send requests wait indefinitely if no broker is available. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Duplicate
Affects Version/s: 0.8.2.0
Fix Version/s: None
Component/s: producer
Labels:
None

Description

I'm seeing issues when sending a message with the new producer client API. The future returned from Producer.send() will block indefinitely if the cluster is unreachable for some reason. Here are the steps:

Start up a single node kafka cluster locally.

Start up application and create a KafkaProducer with the following config:

KafkaProducerWrapper values: 
	compression.type = snappy
	metric.reporters = []
	metadata.max.age.ms = 300000
	metadata.fetch.timeout.ms = 60000
	acks = all
	batch.size = 16384
	reconnect.backoff.ms = 10
	bootstrap.servers = [localhost:9092]
	receive.buffer.bytes = 32768
	retry.backoff.ms = 100
	buffer.memory = 33554432
	timeout.ms = 30000
	key.serializer = class com.mycompany.kafka.serializer.ToStringEncoder
	retries = 3
	max.request.size = 1048576
	block.on.buffer.full = true
	value.serializer = class com.mycompany.kafka.serializer.JsonEncoder
	metrics.sample.window.ms = 30000
	send.buffer.bytes = 131072
	max.in.flight.requests.per.connection = 5
	metrics.num.samples = 2
	linger.ms = 0
	client.id = site-json

Send some messages...they are successfully sent
Shut down the kafka broker
Send another message.

At this point, calling get() on the returned Future will block indefinitely until the broker is restarted.

It appears that there is some logic in org.apache.kafka.clients.producer.internal.Sender that is supposed to mark the Future as "done" in response to a disconnect event (towards the end of the run(long) method). However, the while loop earlier in this method seems to remove the broker from consideration entirely, so the final loop over ClientResponse objects is never executed.

It seems like "timeout.ms" configuration should be honored in this case, or perhaps introduce another timeout, indicating that we should give up waiting for the cluster to return.

Attachments

Issue Links

duplicates

KAFKA-1788 producer record can stay in RecordAccumulator forever if leader is no available

Resolved

Activity

People

Assignee:: Jun Rao

Reporter:: David Hay

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Apr/15 18:04

Updated:: 04/May/15 22:26

Resolved:: 04/May/15 22:26