Currently if you enable log compaction the compactor will kick in whenever you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. Other than this we don't give you fine-grained control over when compaction occurs. In addition we never compact the active segment (since it is still being written to).
Other than this we don't really give you much control over when compaction will happen. The result is that you can't really guarantee that a consumer will get every update to a compacted topic--if the consumer falls behind a bit it might just get the compacted version.
This is usually fine, but it would be nice to make this more configurable so you could set either a # messages, size, or time bound for compaction.
This would let you say, for example, "any consumer that is no more than 1 hour behind will get every message."
This should be relatively easy to implement since it just impacts the end-point the compactor considers available for compaction. I think we already have that concept, so this would just be some other overrides to add in when calculating that.
- links to
- mentioned in