Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Description
One of Kafka's officially-described use cases is a distributed commit log (http://kafka.apache.org/documentation.html#uses_commitlog). In this case, for a distributed service that needed a commit log, there would be a topic with a single partition to guarantee log order. This service would use the commit log to re-sync failed nodes. Kafka is generally an excellent fit for such a system, but it does not expose an adequate mechanism for log cleanup in such a case. The built-in log cleanup mechanisms are based on time / size thresholds, which doesn't work well with a commit log; data can only be deleted from a commit log when the client application determines that it is no longer needed. Here we propose a new API exposed to clients through AdminUtils that will delete all messages before a certain offset from a specific partition.
Rejected Alternatives
- Manually setting / resetting time intervals for log retention configs to periodically flush messages from the logs from before a certain time period. Doing this involves several asynchronous processes, none of which provide any hooks to know when they are actually complete.
- Rolling a new topic each time we want to cleanup the log. This is the best existing approach, but is not ideal. All incoming writes would be paused while waiting for a new topic to be created.