Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7226

Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain

    XMLWordPrintableJSON

Details

    Description

      org.apache.hudi.table.action.clean.CleanPlanner#getFilesToCleanKeepingLatestCommits(java.lang.String, int, org.apache.hudi.common.model.HoodieCleaningPolicy)

      lastVersionBeforeEarliestCommitToRetain is not honored by KEEP_LATEST_BY_HOURS policy. This essentially makes cleaner to remove the file slice when it becomes non-latest. This could fail long-running queries in a race condition:

      1. timeline contains a t0.deltacommit (not cleaned because it's latest)
      2. a snapshot query starts and running
      3. compaction runs and creates t1.commit
      4. cleaner runs and remove t0 (because now t1.commit is the latest)
      5. the query failed due to a log file belongs to t0.deltacommit is not found

      Attachments

        Issue Links

          Activity

            People

              tim.brown Timothy Brown
              xushiyan Shiyan Xu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: