KIP-557 added emit-on-change semantics to KTables that suppress updates for duplicate values.
However, this may cause data loss in at_least_once topologies when records are retried from the last commit due to an error / restart / etc.
Consider the following example:
- Record A gets read
- Record A is stored in the table
- The update for record A is forwarded through the topology
- Map() throws (or alternatively, any restart while the forwarded update was still being processed and not yet produced to the output topic)
- The stream is restarted and "retries" from the last commit
- Record A gets read again
- The table will discard the update for record A because
- The value is the same
- The timestamp is the same
- Eventually the stream will commit
- There is absolutely no output for Record A even though we're running in at_least_once
This behaviour does not seem intentional. The emit-on-change logic explicitly forwards records that have the same value and an older timestamp.
This logic should probably be changed to also forward updates that have an older or equal timestamp.