[KAFKA-2986] Consumer group doesn't lend itself well for slow consumers with varying message size - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.0.0
Fix Version/s: None
Component/s: consumer
Labels:
None
Environment:
Java consumer API 0.9.0.0

Description

I sent a related post to the Kafka mailing list, but haven't received any response: http://mail-archives.apache.org/mod_mbox/kafka-users/201512.mbox/%3CCAL%2BArfWNfkpymkNDuf6UJ06CJJ63XC1bPHeT4TSYXKjSsOpu-Q%40mail.gmail.com%3E So far, I think this is a design issue in Kafka so I'm taking the liberty of creating an issue.

Use case:

Slow consumtion. Maybe around 20 seconds per record.
Large variation in message size: Serialized tasks are in the range of ~300 bytes up to ~3 MB.
Consumtion latency (20 seconds) is independent of message size.

Code example:

while (isRunning()) {
  ConsumerRecords<String, byte[]> records = consumer.poll(100);
  for (final ConsumerRecord<String, byte[]> record : records) {
    // Handle record...
  }
}

Problem: Kafka doesn't have any issues with large messages (as long as you bump some configuration flags). However, the problem is two-fold:

KafkaConsumer#poll is the only call that sends healthchecks.
There is no limit as to how many messages KafkaConsumer#poll will return. The limit is only set to the total number of bytes to be prefetched. This is problematic for varying message sizes as the session timeout becomes extremelly hard to tune:
- delay until next KafkaConsumer#poll call is proportional to the number of records returned by previous KafkaConsumer#poll call.
- KafkaConsumer#poll will return many small records or just a few larger records. For many small messages the risk is very large of the session timeout to kick in. Raising the session timeout in the order of magnitudes required to handle the smaller messages increases the latency until a dead consumer is discovered a thousand fold.

Proposed fixes: I do not claim to be a Kafka expert, but two ideas are to either

allow add `KafkaConsumer#healthy` call to let the broker know we are still processing records; or
add an upper number of message limit to `KafkaConsumer#poll`. I am thinking of something like `KafkaConsumer#poll(timeout, nMaxMessages)`. This could obviously be set a configuration property instead. To avoid the broker having to look at the messages it sends, I suggest the KafkaConsumer decides how many messages it returns from poll.

Workarounds:

Have different topics for different message sizes. Makes tuning of partition prefetch easier.
Use another tool

Questions: Should Kafka be able to handle this case? Maybe I am using the wrong tool for this and Kafka is simply designed for high-throughput/low latency?

Attachments

Issue Links

relates to

KAFKA-2985 Consumer group stuck in rebalancing state

Resolved

KAFKA-3007 Implement max.poll.records for new consumer (KIP-41)

Resolved

mentioned in: Page Loading...

Activity

People

Assignee:: Neha Narkhede

Reporter:: Jens Rantil

Votes:: 3 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 13/Dec/15 13:23

Updated:: 31/Aug/17 19:40

Resolved:: 31/Aug/17 19:40