[KAFKA-4469] Consumer throughput regression caused by inefficient list removal and copy - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Voters

Stop watching

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.1.0
Fix Version/s: 0.10.1.1
Component/s: None
Labels:
None

Description

There appears to be a small performance regression in 0.10.1.0 from previous versions. I tracked it back to ~~KAFKA-3888~~. As part of KIP-62, we decreased the value of max.poll.records from Integer.MAX_VALUE to 500. Based on some performance testing, this results in about a 5% decrease in throughput. This depends on the fetch and message sizes. My test used message size of 1K with the default fetch size, and the default max.poll.records of 500.

The main cause of the regression seems to be an unneeded list copy in Fetcher. Basically when we have more records than we need to satisfy max.poll.records, then we copy the fetched records into a new list. When I modified the code to use a sub-list, which does not need a copy, the performance is much closer to that of 0.10.0 (within 1% or so with lots of qualification since there are many unexplored parameters). The remaining performance gap could be explained by sub-optimal pipelining as a result of KAFKA-4007 (this is likely part of the story anyway based on some rough testing).