Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12226

High-throughput source tasks fail to commit offsets



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0, 3.0.1, 3.2.0
    • connect
    • None


      The current source task thread has the following workflow:

      1. Poll messages from the source task
      2. Queue these messages to the producer and send them to Kafka asynchronously.
      3. Add the message to outstandingMessages, or if a flush is currently active, outstandingMessagesBacklog
      4. When the producer completes the send of a record, remove it from outstandingMessages

      The commit offsets thread has the following workflow:

      1. Wait a flat timeout for outstandingMessages to flush completely
      2. If this times out, add all of the outstandingMessagesBacklog to the outstandingMessages and reset
      3. If it succeeds, commit the source task offsets to the backing store.
      4. Retry the above on a fixed schedule

      If the source task is producing records quickly (faster than the producer can send), then the producer will throttle the task thread by blocking in its send method, waiting at most max.block.ms for space in the buffer.memory to be available. This means that the number of records in outstandingMessages + outstandingMessagesBacklog is proportional to the size of the producer memory buffer.

      This amount of data might take more than offset.flush.timeout.ms to flush, and thus the flush will never succeed while the source task is rate-limited by the producer memory. This means that we may write multiple hours of data to Kafka and not ever commit source offsets for the connector. When the task is lost due to a worker failure, hours of data will be re-processed that otherwise were successfully written to Kafka.


        Issue Links



              ChrisEgerton Chris Egerton
              ChrisEgerton Chris Egerton
              Randall Hauch Randall Hauch
              0 Vote for this issue
              4 Start watching this issue