Kafka
  1. Kafka
  2. KAFKA-3894

LogCleaner thread crashes if not even one segment can fit in the offset map

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.2.2, 0.9.0.1, 0.10.0.0
    • Fix Version/s: 0.10.1.0
    • Component/s: core
    • Labels:
    • Environment:
      Oracle JDK 8
      Ubuntu Precise

      Description

      The log-cleaner thread can crash if the number of keys in a topic grows to be too large to fit into the dedupe buffer.

      The result of this is a log line:

      broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner [kafka-log-cleaner-thread-0], Error due to java.lang.IllegalArgumentException: requirement failed: 9750860 messages in segment MY_FAVORITE_TOPIC-2/00000000000047580165.log but offset map can fit only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads

      As a result, the broker is left in a potentially dangerous situation where cleaning of compacted topics is not running.

      It is unclear if the broader strategy for the LogCleaner is the reason for this upper bound, or if this is a value which must be tuned for each specific use-case.

      Of more immediate concern is the fact that the thread crash is not visible via JMX or exposed as some form of service degradation.

      Some short-term remediations we have made are:

      • increasing the size of the dedupe buffer
      • monitoring the log-cleaner threads inside the JVM

        Issue Links

          Activity

          Hide
          Tom Crayford added a comment -

          (disclaimer: I work with Tim)

          It feels like there are a few pieces of work to do here:

          1. Expose if the log cleaner state as a JMX metric (like BrokerState)
          2. Somehow mark logs we've failed to clean as "busted" somewhere, and stop trying to clean them. This way instead of erroring when this occurs the broker doesn't stay completely busted, but continues on working on all other partitions
          3. I'm unsure, but is it possible to fix the underlying issue by only compacting partial segments of the log when the buffer size is smaller than the desired offset map? This seems like the hardest but most valuable fix here.

          We're happy picking up at least some of these, but would love feedback from the community about priorities and ease/appropriateness of these steps (and suggestions for other things to have).

          Show
          Tom Crayford added a comment - (disclaimer: I work with Tim) It feels like there are a few pieces of work to do here: 1. Expose if the log cleaner state as a JMX metric (like BrokerState) 2. Somehow mark logs we've failed to clean as "busted" somewhere, and stop trying to clean them. This way instead of erroring when this occurs the broker doesn't stay completely busted, but continues on working on all other partitions 3. I'm unsure, but is it possible to fix the underlying issue by only compacting partial segments of the log when the buffer size is smaller than the desired offset map? This seems like the hardest but most valuable fix here. We're happy picking up at least some of these, but would love feedback from the community about priorities and ease/appropriateness of these steps (and suggestions for other things to have).
          Hide
          Ismael Juma added a comment -

          The exception looks like the one in KAFKA-3587, which was fixed in 0.10.0.0.

          Show
          Ismael Juma added a comment - The exception looks like the one in KAFKA-3587 , which was fixed in 0.10.0.0.
          Hide
          Andy Coates added a comment -

          Yep, we've ran into the same issue.

          Would be nice if the cleaner, at the very minimum, skipped the segments with large number of records.

          Show
          Andy Coates added a comment - Yep, we've ran into the same issue. Would be nice if the cleaner, at the very minimum, skipped the segments with large number of records.
          Hide
          Andy Coates added a comment -

          Ismael Juma Not sure this is related. Our situation seems to be just a legitimate large number of records in the segment.

          Show
          Andy Coates added a comment - Ismael Juma Not sure this is related. Our situation seems to be just a legitimate large number of records in the segment.
          Hide
          James Cheng added a comment -

          About log compaction JMX metrics, there is https://issues.apache.org/jira/browse/KAFKA-3857

          Show
          James Cheng added a comment - About log compaction JMX metrics, there is https://issues.apache.org/jira/browse/KAFKA-3857
          Hide
          Peter Davis added a comment -

          A quick improvement would be to increase the severity of the log message when log cleaner stops. Right now there is just an "INFO" message that's easy to miss.

          Show
          Peter Davis added a comment - A quick improvement would be to increase the severity of the log message when log cleaner stops. Right now there is just an "INFO" message that's easy to miss.
          Hide
          Kiran Pillarisetty added a comment -

          Regarding Log cleaner JMX metrics, I just submitted a PR. Please take a look:
          https://github.com/apache/kafka/pull/1593

          JIRA: https://issues.apache.org/jira/browse/KAFKA-3857

          Show
          Kiran Pillarisetty added a comment - Regarding Log cleaner JMX metrics, I just submitted a PR. Please take a look: https://github.com/apache/kafka/pull/1593 JIRA: https://issues.apache.org/jira/browse/KAFKA-3857
          Hide
          Tim Carey-Smith added a comment -

          Woohoo, more metrics is so excellent!

          Regarding the issue I am reporting: it is somewhat broader than the specific issues related to the log cleaner which have been resolved across the lifetime of Kafka.

          Unfortunately, there is a deeper issue: if these threads die, bad things happen.

          KAFKA-3587 was a great step forward, now this exception will only occur if a single segment is unable to fit within the dedupe buffer. Unfortunately, in pathological cases the thread could still die.

          Compacted topics are built to rely on the log cleaner thread and because of this, any segments which are written must be compatible with the configuration for log cleaner threads.
          As I mentioned before, we are now monitoring the log cleaner threads and as a result do not have long periods where a broker is in a dangerous and degraded state.
          One situation which comes to mind is from a talk at Kafka Summit where the thread was offline for a large period of time. Upon restart, the __consumer_offsets topic took 17 minutes to load.
          http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49

          After talking with Tom, we came up with a few solutions which could help in resolving this issue.

          1) The monitoring suggested in KAFKA-3857 is a great start and would most definitely help with determining the state of the log cleaner.
          2) After the change in KAFKA-3587, it could be possible to simply leave segments which are too large and leave them as zombie segments which will never be cleaned. This is less than ideal, but means that a single large segment would not take down the whole log cleaner subsystem.
          3) Upon encountering a large segment, we considered the possibility of splitting the segment to allow the log cleaner to continue. This would potentially delay some cleanup until a later time.
          4) Currently, it seems like the write path allows for segments to be created which are unable to be processed by the log cleaner. Would it make sense to include log cleaner heuristics when determining segment size for compacted topics? This would allow the log cleaner to always process a segment, unless the buffer size was changed.

          We'd love to help in any way we can.

          Show
          Tim Carey-Smith added a comment - Woohoo, more metrics is so excellent! Regarding the issue I am reporting: it is somewhat broader than the specific issues related to the log cleaner which have been resolved across the lifetime of Kafka. Compaction thread dies when hitting compressed and unkeyed messages ( https://github.com/apache/kafka/commit/1cd6ed9e2c07a63474ed80a8224bd431d5d4243c#diff-d7330411812d23e8a34889bee42fedfe ) noted in KAFKA-1755 Logcleaner fails due to incorrect offset map computation on a replica in KAFKA-3587 Unfortunately, there is a deeper issue: if these threads die, bad things happen. KAFKA-3587 was a great step forward, now this exception will only occur if a single segment is unable to fit within the dedupe buffer. Unfortunately, in pathological cases the thread could still die. Compacted topics are built to rely on the log cleaner thread and because of this, any segments which are written must be compatible with the configuration for log cleaner threads. As I mentioned before, we are now monitoring the log cleaner threads and as a result do not have long periods where a broker is in a dangerous and degraded state. One situation which comes to mind is from a talk at Kafka Summit where the thread was offline for a large period of time. Upon restart, the __consumer_offsets topic took 17 minutes to load. http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49 After talking with Tom, we came up with a few solutions which could help in resolving this issue. 1) The monitoring suggested in KAFKA-3857 is a great start and would most definitely help with determining the state of the log cleaner. 2) After the change in KAFKA-3587 , it could be possible to simply leave segments which are too large and leave them as zombie segments which will never be cleaned. This is less than ideal, but means that a single large segment would not take down the whole log cleaner subsystem. 3) Upon encountering a large segment, we considered the possibility of splitting the segment to allow the log cleaner to continue. This would potentially delay some cleanup until a later time. 4) Currently, it seems like the write path allows for segments to be created which are unable to be processed by the log cleaner. Would it make sense to include log cleaner heuristics when determining segment size for compacted topics? This would allow the log cleaner to always process a segment, unless the buffer size was changed. We'd love to help in any way we can.
          Hide
          James Cheng added a comment -

          #4 is a good point. By looking at the buffer size, the broker can calculate how large a segment it can handle, and can thus make sure to only generate segments that it can handle.

          The comment about having a large segment that you are unable to process made me think about the long discussion that happened in https://issues.apache.org/jira/browse/KAFKA-3810. In that JIRA, a large message in the __consumer_offsets topic would block (internal) consumers who had too small of a fetch size.

          The solution that was chosen and was implemented was to loosen the fetch size for fetches from internal topics. Internal topics would always return at least one message, even if the message was larger than the fetch size.

          It made me wonder if it might make sense to treat the dedupe buffer in a similar way. In a steady state, the configured dedupe buffer size would be used but if it's too small to even fit a single segment, then the dedupe buffer would be (temporarily) grown to allow cleaning of that large segment.

          CC Jun Rao

          Show
          James Cheng added a comment - #4 is a good point. By looking at the buffer size, the broker can calculate how large a segment it can handle, and can thus make sure to only generate segments that it can handle. The comment about having a large segment that you are unable to process made me think about the long discussion that happened in https://issues.apache.org/jira/browse/KAFKA-3810 . In that JIRA, a large message in the __consumer_offsets topic would block (internal) consumers who had too small of a fetch size. The solution that was chosen and was implemented was to loosen the fetch size for fetches from internal topics. Internal topics would always return at least one message, even if the message was larger than the fetch size. It made me wonder if it might make sense to treat the dedupe buffer in a similar way. In a steady state, the configured dedupe buffer size would be used but if it's too small to even fit a single segment, then the dedupe buffer would be (temporarily) grown to allow cleaning of that large segment. CC Jun Rao
          Hide
          Jun Rao added a comment -

          James Cheng, this is slightly different from KAFKA-3810. In KAFKA-3810, messages are bounded by MaxMessageSize, which in turn bounds the fetch response size. For cleaning, if messages are uncompressed, the dedupBufferSize needed is bounded by segmentSize/perMessageOverhead. However, if messages are compressed, dedupBufferSize needed could be arbitrarily large. So, I am not sure if we want to auto grow the buffer size arbitrarily.

          #4 seems to be a safer approach. There are effective ways of estimating the number of unique keys (https://people.mpi-inf.mpg.de/~rgemulla/publications/beyer07distinct.pdf) incrementally. We will need to figure out where to store it in order to avoid rescanning the log on startup.

          Show
          Jun Rao added a comment - James Cheng , this is slightly different from KAFKA-3810 . In KAFKA-3810 , messages are bounded by MaxMessageSize, which in turn bounds the fetch response size. For cleaning, if messages are uncompressed, the dedupBufferSize needed is bounded by segmentSize/perMessageOverhead. However, if messages are compressed, dedupBufferSize needed could be arbitrarily large. So, I am not sure if we want to auto grow the buffer size arbitrarily. #4 seems to be a safer approach. There are effective ways of estimating the number of unique keys ( https://people.mpi-inf.mpg.de/~rgemulla/publications/beyer07distinct.pdf ) incrementally. We will need to figure out where to store it in order to avoid rescanning the log on startup.
          Hide
          Tom Crayford added a comment -

          Jun:

          #4 seems potentially very complex to me. It also doesn't work in the case that the broker is shut down and the dedupe buffer size adjusted. I much prefer #3 - it maps fine into the existing model as far as I can tell - we'd "just" split the log file we're cleaning once the offsetmap is full. That of course requires a little more IO, but it doesn't involve implementing (or using a library for) sketches that could potentially be incorrect. It also seems like the right long term solution, and more robust than automatically rolling log files some of the time. Am I missing something here?

          Upsides of #3 vs #4:
          We can now clean the largest log segment, no matter the buffer size.
          We don't increase complexity of the produce path, or change memory usage.
          We don't have to implement or reuse a library for estimating unique keys
          We don't have to figure out storing the key estimate (e.g. in the index or in a new file alongside each segment).

          Downsides:
          It would increase the complexity of the cleaner.
          The code that swaps in and out segments will also get more complex, and the crash-safety of that code is already tricky.

          Exists in both:
          Larger log segments could potentially be split a lot, and not always deduplicated that well together. For example, if I write the max number of unique keys for the offset map into a topic, then the segment rolls, then I write a tombstone for every message in the previously sent messages, then neither #3 nor #4 would ever clear up any data. This is no worse than today though.

          Cassandra and other LSM based systems that do log structured storage and over-time compaction use similar "splitting and combining" mechanisms to ensure everything gets cleared up over time without using too much memory. They have a very different storage architecture and goals to Kafka's compaction, for sure, but it's interesting to note that they care about similar things.

          Show
          Tom Crayford added a comment - Jun: #4 seems potentially very complex to me. It also doesn't work in the case that the broker is shut down and the dedupe buffer size adjusted. I much prefer #3 - it maps fine into the existing model as far as I can tell - we'd "just" split the log file we're cleaning once the offsetmap is full. That of course requires a little more IO, but it doesn't involve implementing (or using a library for) sketches that could potentially be incorrect. It also seems like the right long term solution, and more robust than automatically rolling log files some of the time. Am I missing something here? Upsides of #3 vs #4: We can now clean the largest log segment, no matter the buffer size. We don't increase complexity of the produce path, or change memory usage. We don't have to implement or reuse a library for estimating unique keys We don't have to figure out storing the key estimate (e.g. in the index or in a new file alongside each segment). Downsides: It would increase the complexity of the cleaner. The code that swaps in and out segments will also get more complex, and the crash-safety of that code is already tricky. Exists in both: Larger log segments could potentially be split a lot, and not always deduplicated that well together. For example, if I write the max number of unique keys for the offset map into a topic, then the segment rolls, then I write a tombstone for every message in the previously sent messages, then neither #3 nor #4 would ever clear up any data. This is no worse than today though. Cassandra and other LSM based systems that do log structured storage and over-time compaction use similar "splitting and combining" mechanisms to ensure everything gets cleared up over time without using too much memory. They have a very different storage architecture and goals to Kafka's compaction, for sure, but it's interesting to note that they care about similar things.
          Hide
          Vincent Rischmann added a comment -

          Not adding much to the conversation, but I've just been hit by this bug.

          I'm in the process of upgrading my cluster to 0.9.0.1, and in one case the log cleaner dies because of this.

          requirement failed: 1214976153 messages in segment __consumer_offsets-15/00000000000012560043.log but offset map can fit only 40265317

          If I'm not wrong, there's no way that much messages can fit in the buffer since it's limited to 2G anyway per thread. Right now I'm leaving it as is since the broker seems to be working, but it's not ideal.

          I'm wondering if I simply delete the log file with the broker shut down, will it be fetched at startup from an other replica without problems ?
          In my case, I believe this is only temporary: we never enabled the log cleaner when running 0.8.2.1 (mistake on my part) and now when migrating to 0.9.0.1 it does a giant cleanup at first startup.

          Show
          Vincent Rischmann added a comment - Not adding much to the conversation, but I've just been hit by this bug. I'm in the process of upgrading my cluster to 0.9.0.1, and in one case the log cleaner dies because of this. requirement failed: 1214976153 messages in segment __consumer_offsets-15/00000000000012560043.log but offset map can fit only 40265317 If I'm not wrong, there's no way that much messages can fit in the buffer since it's limited to 2G anyway per thread. Right now I'm leaving it as is since the broker seems to be working, but it's not ideal. I'm wondering if I simply delete the log file with the broker shut down, will it be fetched at startup from an other replica without problems ? In my case, I believe this is only temporary: we never enabled the log cleaner when running 0.8.2.1 (mistake on my part) and now when migrating to 0.9.0.1 it does a giant cleanup at first startup.
          Hide
          Peter Davis added a comment -

          Re: "the broker seems to be working"

          You may regret not taking action now. As Tim mentioned from the talk at the Kafka Summit (http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49), if __consumer_offsets is not compacted and has accumulated millions (or billions!) of messages, it can take many minutes for the broker to elect a new coordinator after any kind of hiccup. Your new consumers may be hung during this time!

          However, even shutting down brokers to change the configuration will cause coordinator elections which will cause an outage. It seems like not having a "hot spare" for Offset Managers is a liability here…

          We were bit by this bug and it caused all kinds of headaches until we managed to get __consumer_offsets cleaned up again.

          Show
          Peter Davis added a comment - Re: "the broker seems to be working" You may regret not taking action now. As Tim mentioned from the talk at the Kafka Summit ( http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49 ), if __consumer_offsets is not compacted and has accumulated millions (or billions!) of messages, it can take many minutes for the broker to elect a new coordinator after any kind of hiccup. Your new consumers may be hung during this time! However, even shutting down brokers to change the configuration will cause coordinator elections which will cause an outage. It seems like not having a "hot spare" for Offset Managers is a liability here… We were bit by this bug and it caused all kinds of headaches until we managed to get __consumer_offsets cleaned up again.
          Hide
          Vincent Rischmann added a comment -

          Yeah, that's why I was hoping for a workaround

          Right now it takes a ridiculous amount of time for the broker to load some partitions, it just took like 1h+ to load a 300Gb partition. In that case it didn't impact production though.

          I believe I have found a workaround in my case, since as said it's a temporary thing: I note all big partitions (more than a Gb let's say) and reassign them on brokers that are already cleaned up. The reassignment takes a long time but in the end I think it'll remove the partition from the problematic broker.

          Show
          Vincent Rischmann added a comment - Yeah, that's why I was hoping for a workaround Right now it takes a ridiculous amount of time for the broker to load some partitions, it just took like 1h+ to load a 300Gb partition. In that case it didn't impact production though. I believe I have found a workaround in my case, since as said it's a temporary thing: I note all big partitions (more than a Gb let's say) and reassign them on brokers that are already cleaned up. The reassignment takes a long time but in the end I think it'll remove the partition from the problematic broker.
          Hide
          Vincent Rischmann added a comment -

          Well that doesn't work, in fact I just realized that the log cleaner threads all died on the migrated brokers. So yep, still need to find a workaround or wait for a fix. How did you manage to cleanup the logs Peter Davis ?

          Show
          Vincent Rischmann added a comment - Well that doesn't work, in fact I just realized that the log cleaner threads all died on the migrated brokers. So yep, still need to find a workaround or wait for a fix. How did you manage to cleanup the logs Peter Davis ?
          Hide
          Jun Rao added a comment -

          Tom Crayford, I chatted with Jay Kreps on this a bit. There are a couple things that we can do to address this issue.

          a. We can potentially make the allocation of the dedup buffer more dynamic. We can start with something small like 100MB. If needed, we can grow the dedup buffer up to the configured size. This will allow us to set a larger default dedup buffer size (say 1GB). If there are not lots of keys, the broker won't be using that much memory. This will allow the default configuration to accommodate more keys.

          b. To handle the edge case where a segment still has more keys than the increased dedup buffer can handle. We can do the #3 approach as you suggested. Basically, if the dedup buffer is full when only a partial segment is loaded, we remember the next offset (say L). We scan all old log segments including this one as before. The only difference is that when scanning the last segment, we force creating a new segment starting at offset L and simply copy the existing messages after L to the new segment. Then, after we swapped in the new segments, we will move the cleaner marker to offset L. This adds a bit of inefficiency since we have to scan the last swapped-in segment again. However, this will allow the cleaner to always make progress regardless of the # of keys. I am not sure that I understand the case you mentioned that won't work in both approach #3 and #4.

          Show
          Jun Rao added a comment - Tom Crayford , I chatted with Jay Kreps on this a bit. There are a couple things that we can do to address this issue. a. We can potentially make the allocation of the dedup buffer more dynamic. We can start with something small like 100MB. If needed, we can grow the dedup buffer up to the configured size. This will allow us to set a larger default dedup buffer size (say 1GB). If there are not lots of keys, the broker won't be using that much memory. This will allow the default configuration to accommodate more keys. b. To handle the edge case where a segment still has more keys than the increased dedup buffer can handle. We can do the #3 approach as you suggested. Basically, if the dedup buffer is full when only a partial segment is loaded, we remember the next offset (say L). We scan all old log segments including this one as before. The only difference is that when scanning the last segment, we force creating a new segment starting at offset L and simply copy the existing messages after L to the new segment. Then, after we swapped in the new segments, we will move the cleaner marker to offset L. This adds a bit of inefficiency since we have to scan the last swapped-in segment again. However, this will allow the cleaner to always make progress regardless of the # of keys. I am not sure that I understand the case you mentioned that won't work in both approach #3 and #4.
          Hide
          Ismael Juma added a comment -

          I updated the title to match the issue that is still present in 0.10.0.x. Note that the log message in 0.10.0.x would be different from the one posted in the JIRA description:

          require(offset > start, "Unable to build the offset map for segment %s/%s. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads".format(log.name, segment.log.file.getName))
          

          It would be good to have a fix for 0.10.1.0 so I set the fix version. Tim and Tom, any of you interested in picking this up?

          Show
          Ismael Juma added a comment - I updated the title to match the issue that is still present in 0.10.0.x. Note that the log message in 0.10.0.x would be different from the one posted in the JIRA description: require(offset > start, "Unable to build the offset map for segment %s/%s. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads" .format(log.name, segment.log.file.getName)) It would be good to have a fix for 0.10.1.0 so I set the fix version. Tim and Tom, any of you interested in picking this up?
          Hide
          Elias Dorneles added a comment - - edited

          I've bumped the bumped into this same issue (log cleaner threads dying because messages wouldn't fit the offset map).

          For some of the topics the messages would almost fit, so I was able to get away just increasing the dedupe buffer load factor (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaConfig.scala#L252) which defaults to 90% of the 2Gb max buffer size.

          For other topics that had more messages and wouldn't fit in the 2Gb in any way, the workaround was to:

          1) decrease the segment size config for that topic [1]
          2) reassign topic partitions, in order to end up with new segments with sizes obeying the config change
          3) rolling restart the nodes, to restart log cleaner threads

          I'd love to know if there is another way of doing this, step 3 is particularly frustrating.

          Good luck!

          [1]: This can be done for a particular topic with: `kafka-topics.sh --zookeeper $ZK --topic $TOPIC --alter --config segment.bytes`, but if needed you can also set `log.segment.bytes` for topics across all cluster.

          Show
          Elias Dorneles added a comment - - edited I've bumped the bumped into this same issue (log cleaner threads dying because messages wouldn't fit the offset map). For some of the topics the messages would almost fit, so I was able to get away just increasing the dedupe buffer load factor ( https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaConfig.scala#L252 ) which defaults to 90% of the 2Gb max buffer size. For other topics that had more messages and wouldn't fit in the 2Gb in any way, the workaround was to: 1) decrease the segment size config for that topic [1] 2) reassign topic partitions, in order to end up with new segments with sizes obeying the config change 3) rolling restart the nodes, to restart log cleaner threads I'd love to know if there is another way of doing this, step 3 is particularly frustrating. Good luck! [1] : This can be done for a particular topic with: `kafka-topics.sh --zookeeper $ZK --topic $TOPIC --alter --config segment.bytes`, but if needed you can also set `log.segment.bytes` for topics across all cluster.
          Hide
          Tom Crayford added a comment -

          Hi Jun,

          We're probably going to start on b. for now. I think a. is incredibly valuable, but it doesn't impact this manner of the log cleaner crashing. I think there are some cases where we will fail to clean up data, but having those exist seems far more preferable than crashing the thread entirely.

          We'll get started with b., hopefully will have a patch up within a few business days.

          Show
          Tom Crayford added a comment - Hi Jun, We're probably going to start on b. for now. I think a. is incredibly valuable, but it doesn't impact this manner of the log cleaner crashing. I think there are some cases where we will fail to clean up data, but having those exist seems far more preferable than crashing the thread entirely. We'll get started with b., hopefully will have a patch up within a few business days.
          Hide
          ASF GitHub Bot added a comment -

          GitHub user tcrayford opened a pull request:

          https://github.com/apache/kafka/pull/1725

          WIP KAFKA-3894: split log segment to avoid crashing cleaner thread

          https://issues.apache.org/jira/browse/KAFKA-3894

          This is a temporary PR, to see what Jenkins has to say about this work in progress change. It will be updated and should not be reviewed at this time.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/heroku/kafka dont_crash_log_cleaner_thread_if_segment_overflows_buffer

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/kafka/pull/1725.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #1725


          commit 995e32398c40c5d4deddeb9e5f359dc5df770b27
          Author: Tom Crayford <tcrayford@googlemail.com>
          Date: 2016-08-12T17:20:26Z

          WIP KAFKA-3894: split log segment to avoid crashing cleaner thread


          Show
          ASF GitHub Bot added a comment - GitHub user tcrayford opened a pull request: https://github.com/apache/kafka/pull/1725 WIP KAFKA-3894 : split log segment to avoid crashing cleaner thread https://issues.apache.org/jira/browse/KAFKA-3894 This is a temporary PR, to see what Jenkins has to say about this work in progress change. It will be updated and should not be reviewed at this time. You can merge this pull request into a Git repository by running: $ git pull https://github.com/heroku/kafka dont_crash_log_cleaner_thread_if_segment_overflows_buffer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1725.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1725 commit 995e32398c40c5d4deddeb9e5f359dc5df770b27 Author: Tom Crayford <tcrayford@googlemail.com> Date: 2016-08-12T17:20:26Z WIP KAFKA-3894 : split log segment to avoid crashing cleaner thread
          Hide
          Jun Rao added a comment -

          Issue resolved by pull request 1725
          https://github.com/apache/kafka/pull/1725

          Show
          Jun Rao added a comment - Issue resolved by pull request 1725 https://github.com/apache/kafka/pull/1725
          Hide
          ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/kafka/pull/1725

          Show
          ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/1725
          Hide
          Jun Rao added a comment -

          Tom Crayford, thanks for the patch. Filed a followup jira KAFKA-4072 to improve the memory usage in log cleaner.

          Show
          Jun Rao added a comment - Tom Crayford , thanks for the patch. Filed a followup jira KAFKA-4072 to improve the memory usage in log cleaner.

            People

            • Assignee:
              Tom Crayford
              Reporter:
              Tim Carey-Smith
            • Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development