Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
https://issues.apache.org/jira/browse/KAFKA-7215 improved error handling in the log cleaner with the goal of not having the whole thread die when an exception happens, but rather mark the partition that caused it as uncleanable and continue cleaning the error-free partitions.
Unfortunately, the current code can still bubble up an exception and cause the thread to die when an error happens before we can grab the filthiest log and start cleaning it. At that point, we don't have a clear reference to the log that caused the exception and chose to throw an IllegalStateException - https://github.com/apache/kafka/blob/39bcc8447c906506d63b8df156cf90174bbb8b78/core/src/main/scala/kafka/log/LogCleaner.scala#L346 (as seen in https://issues.apache.org/jira/browse/KAFKA-8724)
Essentially, exceptions in `grabFilthiestCompactedLog` still cause the thread to die. This can be further improved by trying to catch what log caused the exception in the aforementioned function