[KAFKA-1016] Broker should limit purgatory size - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 0.8.0
Fix Version/s: None
Component/s: purgatory
Labels:
None

Description

I recently ran into a case where a poorly configured Kafka consumer was able to trigger out of memory exceptions in multiple Kafka brokers. The consumer was configured to have a fetcher.max.wait of Int.MaxInt.

For low volume topics, this configuration causes the consumer to block for frequently, and for long periods of time. junrao informs me that the fetch request will time out after the socket timeout is reached. In our case, this was set to 30s.

With several thousand consumer threads, the fetch request purgatory got into the 100,000-400,000 range, which we believe triggered the out of memory exception. nehanarkhede claims to have seem similar behavior in other high volume clusters.

It kind of seems like a bad thing that a poorly configured consumer can trigger out of memory exceptions in the broker. I was thinking maybe it makes sense to have the broker try and protect itself from this situation. Here are some potential solutions:

1. Have a broker-side max wait config for fetch requests.
2. Threshold the purgatory size, and either drop the oldest connections in purgatory, or reject the newest fetch requests when purgatory is full.

Attachments

Issue Links

is superceded by

KAFKA-1430 Purgatory redesign

Resolved

KAFKA-1989 New purgatory design

Resolved

Activity

People

Assignee:: Joel Jacob Koshy

Reporter:: Chris Riccomini

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 19/Aug/13 21:10

Updated:: 30/Jul/19 08:05

Resolved:: 30/Jul/19 08:05