[KAFKA-8803] Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.5.0, 2.3.2, 2.4.2
Component/s: streams
Labels:
None

Description

One streams app is consistently failing at startup with the following exception:

2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] org.apa.kaf.str.pro.int.StreamTask                : task [0_36] Timeout exception caught when initializing transactions for task 0_36. This might happen if the broker is slow to respond, if the network connection to the broker was interrupted, or if similar circumstances arise. You can increase producer parameter `max.block.ms` to increase this timeout.
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

These same brokers are used by many other streams without any issue, including some in the very same processes for the stream which consistently throws this exception.

UPDATE 08/16:

The very first instance of this error is August 13th 2019, 17:03:36.754 and it happened for 4 different streams. For 3 of these streams, the error only happened once, and then the stream recovered. For the 4th stream, the error has continued to happen, and continues to happen now.

I looked up the broker logs for this time, and see that at August 13th 2019, 16:47:43, two of four brokers started reporting messages like this, for multiple partitions:

[2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)

The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, here is a view of the count of these messages over time:

However, as noted, the stream task timeout error continues to happen.

I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 broker. The broker has a patch for KAFKA-8773.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

logs.txt.gz
20/Aug/19 06:02
1.65 MB
Raman Gupta
logs-20200311.txt.gz
11/Mar/20 20:05
2.19 MB
Raman Gupta
logs-client-20200311.txt.gz
11/Mar/20 20:05
158 kB
Raman Gupta
screenshot-1.png
16/Aug/19 16:29
21 kB
Raman Gupta

Issue Links

is caused by

KAFKA-9144 Early expiration of producer state can cause coordinator epoch to regress

Resolved

KAFKA-9307 Transaction coordinator could be left in unknown state after ZK session timeout

Resolved

KAFKA-9749 TransactionMarkerRequestCompletionHandler should treat storage exceptions as retriable

Resolved

KAFKA-10520 InitProducerId may be blocked if least loaded node is not ready to send

Resolved

is duplicated by

KAFKA-8858 Kafka Streams - Failed to Rebalance Error and stream consumer stuck for some reason

Resolved

is related to

KAFKA-13375 Kafka streams apps w/EOS unable to start at InitProducerId

Open

KAFKA-9274 Gracefully handle timeout exceptions on Kafka Streams

Resolved

links to

GitHub Pull Request #8278

(2 is related to, 1 links to)

Activity

People

Assignee:: Guozhang Wang

Reporter:: Raman Gupta

Votes:: 4 Vote for this issue

Watchers:: 19 Start watching this issue

Dates

Created:: 14/Aug/19 17:23

Updated:: 26/Aug/23 11:54

Resolved:: 08/Jun/20 21:21