[KAFKA-8478] Poll for more records before forced processing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: streams
Labels:
None

Description

While analyzing the algorithm of Streams's poll/process loop, I noticed the following:
The algorithm of runOnce is:

loop0:
  long poll for records (100ms)
  loop1:
    loop2: for BATCH_SIZE iterations:
      process one record in each task that has data enqueued
    adjust BATCH_SIZE
    if loop2 processed any records, repeat loop 1
    else, break loop1 and repeat loop0

There's potentially an unwanted interaction between "keep processing as long as any record is processed" and forcing processing after `max.task.idle.ms`.

If there are two tasks, A and B, and A runs out of records on one input before B, then B could keep the processing loop running, and hence prevent A from getting any new records, until max.task.idle.ms expires, at which point A will force processing on its other input partition. The intent of idling is to at least give A a chance of getting more records on the empty input, but under this situation, we'd never even check for more records before forcing processing.

I'm thinking we should only enforce processing if there was a completed poll since we noticed the task was missing inputs (otherwise, we may as well not bother idling at all).

Attachments

Issue Links

is related to

KAFKA-8315 Historical join issues

Resolved

KAFKA-10091 Improve task idling

Resolved

relates to

KAFKA-4113 Allow KTable bootstrap

Resolved

KAFKA-7458 Avoid enforced processing during bootstrap phase

Resolved

Activity

People

Assignee:: John Roesler

Reporter:: John Roesler

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/Jun/19 15:34

Updated:: 08/Jul/21 18:12

Resolved:: 08/Jul/21 18:12