[KAFKA-6199] Single broker with fast growing heap usage - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.2.1
Fix Version/s: 1.1.0
Component/s: None
Labels:
None
Environment:
Amazon Linux

Description

We have a single broker in our cluster of 25 with fast growing heap usage which necessitates us restarting it every 12 hours. If we don't restart the broker, it becomes very slow from long GC pauses and eventually has OutOfMemory errors.

See Screen Shot 2017-11-10 at 11.59.06 AM.png for a graph of heap usage percentage on the broker. A "normal" broker in the same cluster stays below 50% (averaged) over the same time period.

We have taken heap dumps when the broker's heap usage is getting dangerously high, and there are a lot of retained NetworkSend objects referencing byte buffers.

We also noticed that the single affected broker logs a lot more of this kind of warning than any other broker:

WARN Attempting to send response via channel for which there is no open connection, connection id 13 (kafka.network.Processor)

See Screen Shot 2017-11-10 at 1.55.33 PM.png for counts of that WARN log message visualized across all the brokers (to show it happens a bit on other brokers, but not nearly as much as it does on the "bad" broker).

I can't make the heap dumps public, but would appreciate advice on how to pin down the problem better. We're currently trying to narrow it down to a particular client, but without much success so far.

Let me know what else I could investigate or share to track down the source of this leak.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

jstack-2017-12-08.scrubbed.out
14/Dec/17 12:00
221 kB
Robin Tweedie
histo_live_20171206.txt
06/Dec/17 17:12
120 kB
Robin Tweedie
histo_live_80.txt
03/Dec/17 19:26
121 kB
Robin Tweedie
histo_live.txt
02/Dec/17 23:01
135 kB
Robin Tweedie
dominator_tree.png
19/Nov/17 22:40
948 kB
Robin Tweedie
path2gc.png
19/Nov/17 22:40
1.01 MB
Robin Tweedie
merge_shortest_paths.png
19/Nov/17 22:40
499 kB
Robin Tweedie
Screen Shot 2017-11-10 at 11.59.06 AM.png
10/Nov/17 13:57
218 kB
Robin Tweedie
Screen Shot 2017-11-10 at 1.55.33 PM.png
10/Nov/17 13:56
65 kB
Robin Tweedie

Issue Links

relates to

KAFKA-6307 mBeanName should be removed before returning from JmxReporter#removeAttribute()

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Robin Tweedie

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 10/Nov/17 14:03

Updated:: 18/May/18 13:09

Resolved:: 18/May/18 07:46