Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7104

Cleaner could miss to clean up some files w/ savepoint interplay

    XMLWordPrintableJSON

Details

    Description

      Lets say partitioning is day based and is based on created date. So, older partitions generally does not get any new data after few days. 

       

      Lets say we have savepoints added to a day and later removed. 

      day 1: cleaned up. 

      day2: savepoint added. and so cleaner ignord. 

      day3: cleaned up 

      day4: earliest commit to retain based on cleaner configs. 

       

      So, w/ this table/timeline state, if we remove the savepointed commit, data pertaining to day2 will never be cleaned by the cleaner since its lesser than the earliest commit to retain. 

       

      Attachments

        Issue Links

          Activity

            People

              shivnarayan sivabalan narayanan
              shivnarayan sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: