Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12964

Corrupt segment recovery can delete new producer state snapshots

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 3.0.0
    • core
    • None

    Description

      During log recovery, we may schedule asynchronous deletion in deleteSegmentFiles.

      https://github.com/apache/kafka/blob/fc5245d8c37a6c9d585c5792940a8f9501bedbe1/core/src/main/scala/kafka/log/Log.scala#L2382

      If we're truncating the log, this may result in deletions for segments with matching base offsets to segments which will be written in the future. To avoid asynchronously deleting future segments, we rename the segment and index files, but we do not do this for producer state snapshot files. 

      This leaves us vulnerable to a race condition where we could end up deleting snapshot files for segments written after log recovery when async deletion runs.

       

      To fix this, we should first remove the `SnapshotFile` from the `ProducerStateManager` and rename the file to have a `Log.DeletedFileSuffix`. Then we can asynchronously delete the snapshot file later.

      Attachments

        Issue Links

          Activity

            People

              gardnervickers Gardner Vickers
              gardnervickers Gardner Vickers
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: