Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1800

Incorrect HoodieTableFileSystem API usage for pending slices causing issues

    XMLWordPrintableJSON

Details

    Description

      From vbalaji

       

      We are using wrong API of FileSystemView here

      https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85

      We don't include file groups that are in pending compaction but with Hbase Index we are including them. With the current state of code, Including files in pending compaction is an issue.

      This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by CompactionAdminClient to figure out log files that were added after pending compaction and rename them such that we can undo the effects of compaction scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" which gives a consolidated view of the latest file slice and includes all data both before and after compaction. This is what should be used in

      https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85

      The other workaround would be excluding file slices in pending compaction when we select small files to avoid the interaction between compactor and ingestion in this case. But, I think we can go with the first option

       

      More details can be found here -> https://github.com/apache/hudi/issues/2633

      Attachments

        Issue Links

          Activity

            People

              ryanpife Ryan Pifer
              nishith29 Nishith Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: