[KAFKA-4469] Consumer throughput regression caused by inefficient list removal and copy - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.1.0
Fix Version/s: 0.10.1.1
Component/s: None
Labels:
None

Description

There appears to be a small performance regression in 0.10.1.0 from previous versions. I tracked it back to ~~KAFKA-3888~~. As part of KIP-62, we decreased the value of max.poll.records from Integer.MAX_VALUE to 500. Based on some performance testing, this results in about a 5% decrease in throughput. This depends on the fetch and message sizes. My test used message size of 1K with the default fetch size, and the default max.poll.records of 500.

The main cause of the regression seems to be an unneeded list copy in Fetcher. Basically when we have more records than we need to satisfy max.poll.records, then we copy the fetched records into a new list. When I modified the code to use a sub-list, which does not need a copy, the performance is much closer to that of 0.10.0 (within 1% or so with lots of qualification since there are many unexplored parameters). The remaining performance gap could be explained by sub-optimal pipelining as a result of KAFKA-4007 (this is likely part of the story anyway based on some rough testing).

Attachments

Issue Links

links to

GitHub Pull Request #2190

Activity

People

Assignee:: Jason Gustafson

Reporter:: Jason Gustafson

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Nov/16 01:48

Updated:: 30/Nov/16 21:35

Resolved:: 30/Nov/16 21:35