Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4545

tombstone needs to be removed after delete.retention.ms has passed after it has been cleaned

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.10.0.0, 0.11.0.0, 1.0.0
    • 3.1.0
    • log

    Description

      The algorithm for removing the tombstone in a compacted is supposed to be the following.
      1. Tombstone is never removed when it's still in the dirty portion of the log.
      2. After the tombstone is in the cleaned portion of the log, we further delay the removal of the tombstone by delete.retention.ms since the time the tombstone is in the cleaned portion.

      Once the tombstone is in the cleaned portion, we know there can't be any message with the same key before the tombstone. Therefore, for any consumer, if it reads a non-tombstone message before the tombstone, but can read to the end of the log within delete.retention.ms, it's guaranteed to see the tombstone.

      However, the current implementation doesn't seem correct. We delay the removal of the tombstone by delete.retention.ms since the last modified time of the last cleaned segment. However, the last modified time is inherited from the original segment, which could be arbitrarily old. So, the tombstone may not be preserved as long as it needs to be.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Yohan123 Richard Yu
            junrao Jun Rao
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment