Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
In CleanPlanner, KEEP_LATEST_BY_HOURS is setting earliestCommitToRetain value by consider timestamp directly, this will introduce bug if there are out of order commits where commit with lower timestamp is completed much later than commits with higher timestamps.
This policy's implementation needs to be revisit.
It should basically store the timestamp until which it cleaned let this be t1. Next cleaner instant should consider all the partitions and files that are modified from the point of t1 until (currentime-x) hours. Whichever files are not valid they should be removed.
Attachments
Issue Links
- links to