[KAFKA-8202] StackOverflowError on producer when splitting batches - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

Hello,

recently we came across a StackOverflowError error in the Kafka producer java library. The error caused the Kafka producer to stop (we had to restart our service due to: IllegalStateException: Cannot perform operation after producer has been closed).

The stack trace was as follows:

java.lang.StackOverflowError: null
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.chain(FutureRecordMetadata.java:89)
// […]

The piece of code responsible for the error:

/**
 * This method is used when we have to split a large batch in smaller ones. A chained metadata will allow the
 * future that has already returned to the users to wait on the newly created split batches even after the
 * old big batch has been deemed as done.
 */
void chain(FutureRecordMetadata futureRecordMetadata) {
    if (nextRecordMetadata == null)
        nextRecordMetadata = futureRecordMetadata;
    else
        nextRecordMetadata.chain(futureRecordMetadata);
}

Before the error occurred we observed large amount of logs related to record batches being split (caused by MESSAGE_TOO_LARGE error) on one of our topics (logged by org.apache.kafka.clients.producer.internals.Sender):

[Producer clientId=producer-1] Got error produce response in correlation id 158621342 on topic-partition <topic name>, splitting and retrying (2147483647 attempts left). Error: MESSAGE_TOO_LARGE

All logs had different correlation ids, but the same counters of attempts left (2147483647), so it looked like they were related to different requests and all of them were succeeding with no further retries.

We are using kafka-clients java library in version 2.0.0, the brokers are 2.1.1.

Thanks in advance.

Attachments

Issue Links

relates to

KAFKA-8350 Splitting batches should consider topic-level message size

Open

links to

GitHub Pull Request #7229

Activity

People

Assignee:: Zhanxiang (Patrick) Huang

Reporter:: Daniel Krawczyk

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Apr/19 09:33

Updated:: 22/Aug/19 17:26