Details
-
Improvement
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
Description
org.apache.hudi.table.action.clean.CleanPlanner#getFilesToCleanKeepingLatestCommits(java.lang.String, int, org.apache.hudi.common.model.HoodieCleaningPolicy)
lastVersionBeforeEarliestCommitToRetain is not honored by KEEP_LATEST_BY_HOURS policy. This essentially makes cleaner to remove the file slice when it becomes non-latest. This could fail long-running queries in a race condition:
- timeline contains a t0.deltacommit (not cleaned because it's latest)
- a snapshot query starts and running
- compaction runs and creates t1.commit
- cleaner runs and remove t0 (because now t1.commit is the latest)
- the query failed due to a log file belongs to t0.deltacommit is not found
Attachments
Issue Links
- links to