Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-13499

Avoid restoring outdated records

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • streams
    • None

    Description

      Kafka Streams has the config `windowstore.changelog.additional.retention.ms` to allow for an increase retention time.

      While an increase retention time can be useful, it can also lead to unnecessary restore cost, especially for stream-stream joins. Assume a stream-stream join with 1h window size and a grace period of 1h. For this case, we only need 2h of data to restore. If we lag, the `windowstore.changelog.additional.retention.ms` helps to prevent the broker from truncating data too early. However, if we don't lag and we need to restore, we restore everything from the changelog.

      Instead of doing a seek-to-beginning, we could use the timestamp index to seek the first offset older than the 2h "window" of data that we need to restore, to avoid unnecessary work.

      Attachments

        Issue Links

          Activity

            People

              danicafine Danica Fine
              mjsax Matthias J. Sax
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: