Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12508

Emit-on-change tables may lose updates on error or restart in at_least_once

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.6.0, 2.7.0, 2.6.1
    • 2.8.0, 2.7.1, 2.6.2
    • streams
    • None

    Description

      KIP-557 added emit-on-change semantics to KTables that suppress updates for duplicate values.

      However, this may cause data loss in at_least_once topologies when records are retried from the last commit due to an error / restart / etc.

       

      Consider the following example:

      streams.table(source, materialized)
      .toStream()
      .map(mayThrow())
      .to(output)

       

      1. Record A gets read
      2. Record A is stored in the table
      3. The update for record A is forwarded through the topology
      4. Map() throws (or alternatively, any restart while the forwarded update was still being processed and not yet produced to the output topic)
      5. The stream is restarted and "retries" from the last commit
      6. Record A gets read again
      7. The table will discard the update for record A because
        1. The value is the same
        2. The timestamp is the same
      8. Eventually the stream will commit
      9. There is absolutely no output for Record A even though we're running in at_least_once

       

      This behaviour does not seem intentional. The emit-on-change logic explicitly forwards records that have the same value and an older timestamp.

      This logic should probably be changed to also forward updates that have an older or equal timestamp.

      Attachments

        Issue Links

          Activity

            People

              vvcephei John Roesler
              nhab Nico Habermann
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: