[KAFKA-10901] Lock contention on high produce rate causing cluster degregation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.5.0
Fix Version/s: None
Component/s: producer
Labels:
None
Environment:
broker: version 2.5.0 with 8 cores,32gb, hdd
producer: Sarama producer version 1.5.2(go) with 500ms linger and 2mb batch-size

Description

scaling up (20 -> 40) producers causing idle percentage to drop from 70-80% into 0-1 %, the request queue size to increase by 200%, and overall producers latency increased by 700%.
also, the CPU usage dropped by 30%

after we ran some profiling we saw that there is high lock contention on the write requests but, CPU remained low, we didn't we any strange activity in the disk write/read/IOPS only the other way around because everything became slower the cluster processed much fewer data.

in comparison when there were 20 producers, you can see that the ratio of produce/fetch is

from limited observation, we saw the number of the produce request from this upscaled producer increased from 1500 to 2500(150 per broker) per sec, but overall produce request in the cluster remained the same, on the other hand, the number of fetch requests decreased by 50%

to fix the issue we increased this specific producer linger.ms from 500ms to 1000ms and suddenly the whole cluster became healthy.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screen Shot 2021-01-04 at 11.46.47.png
05/Jan/21 10:08
50 kB
ilya morgenshtern
Screen Shot 2021-01-04 at 11.46.08.png
05/Jan/21 10:20
77 kB
ilya morgenshtern

Activity

People

Assignee:: Unassigned

Reporter:: ilya morgenshtern

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Jan/21 10:41

Updated:: 06/Jan/21 18:18