Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5510

Streams should commit all offsets regularly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Do
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: streams
    • Labels:
      None

      Description

      Currently, Streams commits only offsets of partitions it did process records for. Thus, if a partition does not have any data for longer then offsets.retention.minutes (default 1 day) the latest committed offset get's lost. On failure or restart auto.offset.rese kicks in potentially resulting in reprocessing old data.

      Thus, Streams should commit all offset on a regular basis. Not sure what the overhead of a commit is – if it's too expensive to commit all offsets on regular commit, we could also have a second config that specifies an "commit.all.interval".

      This relates to https://issues.apache.org/jira/browse/KAFKA-3806, so we should sync to get a solid overall solution.

      At the same time, it might be better to change the semantics of offsets.retention.minutes in the first place. It might be better to apply this setting only if the consumer group is completely dead (and not on "last commit" and "per partition" basis). Thus, this JIRA would be a workaround fix if core cannot be changed quickly enough.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mjsax Matthias J. Sax
              • Votes:
                1 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: