Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10327

Make flush after some count of putted records in SinkTask

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.5.0
    • None
    • connect

    Description

      In current version of kafka connect all records accumulated with SinkTask.put method are flushed to target system on a time-based manner. So data is flushed and offsets are committed every  offset.flush.timeout.ms (default is 60000) ms.

      But you can't control the number of messages you receive from Kafka between two flushes. It may cause out of memory errors, because in-memory buffer may grow a lot. 

      I suggest to add out of box support of count-based flush to kafka connect. It requires new configuration parameter (offset.flush.count, for example). Number of records sent to SinkTask.put should be counted, and if these amount is greater than offset.flush.count's value, SinkTask.flush is called and offsets are committed.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pavel-sbor Pavel Kuznetsov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: