We have been seeing recently hanging transactions occur on streams changelog topics quite frequently. After investigation, we found that the keys used in the changelog topic conflict with the keys used in the transaction markers (the schema used in control records is 4 bytes, which happens to be the same for the changelog topics that we investigated). When we build the offset map prior to cleaning, we do properly exclude the transaction marker keys, but the bug is the fact that we do not exclude them during the cleaning phase. This can result in the marker being removed from the cleaned log before the corresponding data is removed when there is a user record with a conflicting key at a higher offset. A side effect of this is a so-called "hanging" transaction, but the bigger problem is that we lose the atomicity of the transaction.
- links to