Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1428

Clean old fileslice is invalid

    XMLWordPrintableJSON

Details

    Description

      Reproduce : 

      Table type MERGE_ON_READ and set hoodie.cleaner.commits.retained=2

      Do insert into table three times  into same partition

      And it will create follow parquet file

      a2c57b73-5ba9-4744-9027-640075a179ec-0_0-213-1740_20201201200149.parquet

      a2c57b73-5ba9-4744-9027-640075a179ec-0_0-176-1702_20201201195638.parquet

      a2c57b73-5ba9-4744-9027-640075a179ec-0_0-139-1664_20201201195219.parquet

      a2c57b73-5ba9-4744-9027-640075a179ec-0_0-103-1632_20201201193835.parquet

      And 20201201200149.parquet is newest 

      The old parquet files have not be deleted when do client.clean()

      The reason is CleanPlanner  init commitTimeline with hoodieTable.getCompletedCommitTimeline();

      And getCompletedCommitTimeline method not include DELTA_COMMIT_ACTION

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              henryz steven zhang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: