Kafka
  1. Kafka
  2. KAFKA-70

Introduce retention setting that depends on space

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7
    • Fix Version/s: 0.7
    • Component/s: None
    • Labels:

      Description

      Currently there is this setting:

      log.retention.hours

      introduce:

      log.retention.size

      Semantic would be, either size is reached or time is reached, oldest messages will be deleted.

      This does not break back-ward compatibility and would make the system robust under scenarios where message size is not deterministic over time.

      1. KAFKA-70_v2.patch
        7 kB
        Prashanth Menon
      2. KAFKA-70.patch
        7 kB
        Prashanth Menon

        Activity

        Hide
        Jun Rao added a comment -

        Prashanth,

        Not sure if you still have those questions in the deleted comment. In any case, for your first question, the size limit can be checked every time we roll the log. For the second question, the default value can be no size limit to keep it consistent with the old behavior.

        Show
        Jun Rao added a comment - Prashanth, Not sure if you still have those questions in the deleted comment. In any case, for your first question, the size limit can be checked every time we roll the log. For the second question, the default value can be no size limit to keep it consistent with the old behavior.
        Hide
        Prashanth Menon added a comment -

        Hi Jun,

        Nope, answered them myself (had a brain-blank moment), hence why I deleted it. In the mean time, I've attached a patch I think should enable this feature. I originally did this on GitHub with a pull request but was redirected here

        Let me know what you think or if this isn't quite what was expected.

        • Prashanth
        Show
        Prashanth Menon added a comment - Hi Jun, Nope, answered them myself (had a brain-blank moment), hence why I deleted it. In the mean time, I've attached a patch I think should enable this feature. I originally did this on GitHub with a pull request but was redirected here Let me know what you think or if this isn't quite what was expected. Prashanth
        Hide
        Joel Koshy added a comment -

        Prashanth,

        Thanks a lot for the patch. cleanUpSegmentsToMaintainSize will walk through the segments in whatever order they were loaded. That is done in loadSegments which (due to java.io.File.listFiles) does not guarantee any particular order. We need to do a numeric sort of the segments and then start deleting from the segment with the smallest offset.

        Joel

        Show
        Joel Koshy added a comment - Prashanth, Thanks a lot for the patch. cleanUpSegmentsToMaintainSize will walk through the segments in whatever order they were loaded. That is done in loadSegments which (due to java.io.File.listFiles) does not guarantee any particular order. We need to do a numeric sort of the segments and then start deleting from the segment with the smallest offset. Joel
        Hide
        Prashanth Menon added a comment -

        Hi Joel,

        Looking through Log.loadSegments it looks like the segments (if any were found) are already sorted in order of ascending offset. This effectively solves the problem, no?

        • Prashanth
        Show
        Prashanth Menon added a comment - Hi Joel, Looking through Log.loadSegments it looks like the segments (if any were found) are already sorted in order of ascending offset. This effectively solves the problem, no? Prashanth
        Hide
        Jay Kreps added a comment -

        I agree the delete logic is correct. Log.scala keeps all segments in sorted order, and enforces the behavior that segments can only be deleted in reverse order (it uses takeWhile on the list) so this should guarantee that segments are deleted from oldest to newest. When we take the patch we should add a comments in our existing code to the effect that the list is guaranteed to be in sorted order, since that is kind of implicit now but it is important.

        Prashanth, one corner case here is when the segment size < log.retention.size. One thing to note is that if the currently active segment is marked for deletion, even if that segment is not full, it will dutifully be deleted and a new, empty segment created. The cleanup based on modified time always guarantees that the last message happened AT LEAST log.retention.hours ago because it uses the modification time of the file. This means the client always has at least that much time to pick up their messages. This gets weird, though, when we delete the active segment based on size. It looks to me that if I have a segment size of 100MB but a log.retention.size of 50MB, then when my single-segment log gets to 50MB we will delete the active segment, effectively all messages in the log, even though likely it contains as-yet undelivered messages which may have arrived only ms ago. I think this will be confusing for people. Two ways to fix it: one is to require that log.retention.size > log.file.size, the other is to change the delete logic so that it deletes all but the most recent segment. Another way to say this is that since our granularity of cleanup is log.file.size we have to either overshoot or undershoot their given retention size by as much as log.file.size-1. It is probably safer to overshoot rather than undershoot what they give. Another alternative would be to phrase the retention configuration in terms of the number of files rather than the size in bytes which would be more exact, but I think the way you have done it is more intuitive.

        Another feature request: would it be possible to add a per-topic config similar to topic.log.retention.hours (say topic.log.retention.size) to allow configuring the log size at the topic level? If not I think we can take a patch that fixes the above issue and just open a bug for that.

        Show
        Jay Kreps added a comment - I agree the delete logic is correct. Log.scala keeps all segments in sorted order, and enforces the behavior that segments can only be deleted in reverse order (it uses takeWhile on the list) so this should guarantee that segments are deleted from oldest to newest. When we take the patch we should add a comments in our existing code to the effect that the list is guaranteed to be in sorted order, since that is kind of implicit now but it is important. Prashanth, one corner case here is when the segment size < log.retention.size. One thing to note is that if the currently active segment is marked for deletion, even if that segment is not full, it will dutifully be deleted and a new, empty segment created. The cleanup based on modified time always guarantees that the last message happened AT LEAST log.retention.hours ago because it uses the modification time of the file. This means the client always has at least that much time to pick up their messages. This gets weird, though, when we delete the active segment based on size. It looks to me that if I have a segment size of 100MB but a log.retention.size of 50MB, then when my single-segment log gets to 50MB we will delete the active segment, effectively all messages in the log, even though likely it contains as-yet undelivered messages which may have arrived only ms ago. I think this will be confusing for people. Two ways to fix it: one is to require that log.retention.size > log.file.size, the other is to change the delete logic so that it deletes all but the most recent segment. Another way to say this is that since our granularity of cleanup is log.file.size we have to either overshoot or undershoot their given retention size by as much as log.file.size-1. It is probably safer to overshoot rather than undershoot what they give. Another alternative would be to phrase the retention configuration in terms of the number of files rather than the size in bytes which would be more exact, but I think the way you have done it is more intuitive. Another feature request: would it be possible to add a per-topic config similar to topic.log.retention.hours (say topic.log.retention.size) to allow configuring the log size at the topic level? If not I think we can take a patch that fixes the above issue and just open a bug for that.
        Hide
        Prashanth Menon added a comment -

        Haha, this is quite uncanny. I had the same discussion regarding log.file.size and log.retention.size after I made the patch and concluded (I believe incorrectly) that the retention size should always be larger than the segment size. I see both sides of the argument but I'm not really in a position to make a decision (due to me being rather new and lacking the knowledge of the breadth of use cases it's being used under). From my point of view having the retention size be larger than the log file size is the natural solution; implementing logic to trim the log to log.retention.size but leaving the active segment undeleted is confusing since it doesn't really live by the rule its namesake configuration suggests. As a user, I will still see the log file greater than the retention size and be confused (perhaps this can be solved with a simple rename of the configuration?). Just my two cents. I'd agree with you though, dealing with bytes rather than files is a more intuitive way.

        In any case, let me know which solution you would like to go with. As it is right now, I made the retention size > log size rule implicit, but I can add it into the code and throw an error or warning.

        I'd definitely like to work on the per-topic retention feature request, it'll give me a better chance to dig through the code My preference would be to create a new bug for it (assign it to me?).

        Cheers,
        Prashanth

        Show
        Prashanth Menon added a comment - Haha, this is quite uncanny. I had the same discussion regarding log.file.size and log.retention.size after I made the patch and concluded (I believe incorrectly) that the retention size should always be larger than the segment size. I see both sides of the argument but I'm not really in a position to make a decision (due to me being rather new and lacking the knowledge of the breadth of use cases it's being used under). From my point of view having the retention size be larger than the log file size is the natural solution; implementing logic to trim the log to log.retention.size but leaving the active segment undeleted is confusing since it doesn't really live by the rule its namesake configuration suggests. As a user, I will still see the log file greater than the retention size and be confused (perhaps this can be solved with a simple rename of the configuration?). Just my two cents. I'd agree with you though, dealing with bytes rather than files is a more intuitive way. In any case, let me know which solution you would like to go with. As it is right now, I made the retention size > log size rule implicit, but I can add it into the code and throw an error or warning. I'd definitely like to work on the per-topic retention feature request, it'll give me a better chance to dig through the code My preference would be to create a new bug for it (assign it to me?). Cheers, Prashanth
        Hide
        Jun Rao added a comment -

        Prashanth, thanks for the patch.

        I suggest that we enforce retention size > log size and disallow the broker to start if the rule is violated.

        Show
        Jun Rao added a comment - Prashanth, thanks for the patch. I suggest that we enforce retention size > log size and disallow the broker to start if the rule is violated.
        Hide
        Prashanth Menon added a comment -

        Okay, expect a patch with the added rule enforcement later today.

        Show
        Prashanth Menon added a comment - Okay, expect a patch with the added rule enforcement later today.
        Hide
        Jay Kreps added a comment -

        Actually, now that I think about it, not sure if just limiting the config to the size of one segment is enough. Intuitively here is what I think we want the config to mean: "Keep the least amount of data possible to guarantee a buffer for the consumer of at least N bytes". If we don't give that guarantee of that form there is no way to reason about the SLA for the consumer to pick up their data. For example if retention_size is 50*1024*1024+10 and segment size is 5050*1024*1024, then the log may actually be trimmed as small as 10 bytes.

        So Jun, I think the config approach doesn't work. I think we need to tweak the delete rule slightly.

        Show
        Jay Kreps added a comment - Actually, now that I think about it, not sure if just limiting the config to the size of one segment is enough. Intuitively here is what I think we want the config to mean: "Keep the least amount of data possible to guarantee a buffer for the consumer of at least N bytes". If we don't give that guarantee of that form there is no way to reason about the SLA for the consumer to pick up their data. For example if retention_size is 50*1024*1024+10 and segment size is 5050*1024*1024, then the log may actually be trimmed as small as 10 bytes. So Jun, I think the config approach doesn't work. I think we need to tweak the delete rule slightly.
        Hide
        Jun Rao added a comment -

        One possibility is to have the following semantic: delete segment files as much as possible with at least retention_size amount of data remaining (instead of bringing the total data size to below the retention_size).

        Show
        Jun Rao added a comment - One possibility is to have the following semantic: delete segment files as much as possible with at least retention_size amount of data remaining (instead of bringing the total data size to below the retention_size).
        Hide
        Jay Kreps added a comment -

        Jun, yeah, exactly, I think that is what I was trying to say but said more clearly.

        Show
        Jay Kreps added a comment - Jun, yeah, exactly, I think that is what I was trying to say but said more clearly.
        Hide
        Prashanth Menon added a comment - - edited

        Hi all, I've attached an updated patch for your consideration. It now deletes segments leaving at least logRetentionSize space in the log.

        Show
        Prashanth Menon added a comment - - edited Hi all, I've attached an updated patch for your consideration. It now deletes segments leaving at least logRetentionSize space in the log.
        Hide
        Prashanth Menon added a comment -

        Hi guys, had a chance to look at the newest patch?

        Show
        Prashanth Menon added a comment - Hi guys, had a chance to look at the newest patch?
        Hide
        Neha Narkhede added a comment -

        +1. Thanks for the patch !

        Show
        Neha Narkhede added a comment - +1. Thanks for the patch !
        Hide
        Jay Kreps added a comment -

        Applied, thanks! I will open a JIRA to add per-topic overrides for the retention size.

        Show
        Jay Kreps added a comment - Applied, thanks! I will open a JIRA to add per-topic overrides for the retention size.
        Hide
        Chris Burroughs added a comment -

        Trunk == 0.7 not 0.8, right?

        Show
        Chris Burroughs added a comment - Trunk == 0.7 not 0.8, right?
        Hide
        Neha Narkhede added a comment -

        You are right Chris. Trunk is on 0.7

        Show
        Neha Narkhede added a comment - You are right Chris. Trunk is on 0.7

          People

          • Assignee:
            Unassigned
            Reporter:
            Jun Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development