Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7104

Cleaner could miss to clean up some files w/ savepoint interplay

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Lets say partitioning is day based and is based on created date. So, older partitions generally does not get any new data after few days. 

       

      Lets say we have savepoints added to a day and later removed. 

      day 1: cleaned up. 

      day2: savepoint added. and so cleaner ignord. 

      day3: cleaned up 

      day4: earliest commit to retain based on cleaner configs. 

       

      So, w/ this table/timeline state, if we remove the savepointed commit, data pertaining to day2 will never be cleaned by the cleaner since its lesser than the earliest commit to retain. 

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shivnarayan sivabalan narayanan
            shivnarayan sivabalan narayanan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment