Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
Description
When cleaner is based on hours, we estimate the earliest commit to retain based on current time zone and not UTC or the timezone used to generate the commit time. so, there could be some mis-calculations and lead to deleting additional slices.
else if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS) { Instant instant = Instant.now(); ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(instant, ZoneId.systemDefault()); String earliestTimeToRetain = HoodieActiveTimeline.formatDate(Date.from(currentDateTime.minusHours(hoursRetained).toInstant())); earliestCommitToRetain = Option.fromJavaOptional(commitTimeline.getInstantsAsStream().filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), HoodieTimeline.GREATER_THAN_OR_EQUALS, earliestTimeToRetain)).findFirst()); }
Potential fixes:
- Fix the time based on time zone set in table config.
- Fetch the latest completed commit and decide the earliest commit based on that.
Attachments
Issue Links
- links to