Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10091

Improve task idling

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 3.0.0
    • streams

    Description

      When Streams is processing a task with multiple inputs, each time it is ready to process a record, it has to choose which input to process next. It always takes from the input for which the next record has the least timestamp. The result of this is that Streams processes data in timestamp order. However, if the buffer for one of the inputs is empty, Streams doesn't know what timestamp the next record for that input will be.

      Streams introduced a configuration "max.task.idle.ms" in KIP-353 to address this issue.

      https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization

      The config allows Streams to wait some amount of time for data to arrive on the empty input, so that it can make a timestamp-ordered decision about which input to pull from next.

      However, this config can be hard to use reliably and efficiently, since what we're really waiting for is the next poll that would return data from the empty input's partition, and this guarantee is a function of the poll interval, the max poll interval, and the internal logic that governs when Streams will poll again.

      The ideal case is you'd be able to guarantee at a minimum that any amount of idling would guarantee you poll data from the empty partition if there's data to fetch.

      Attachments

        Issue Links

          Activity

            People

              vvcephei John Roesler
              vvcephei John Roesler
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: