Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4917

Optimized the way to get HoodieBaseFile of loadColumnRangesFromFiles of Bloom Index

    XMLWordPrintableJSON

Details

    Description

      When using Bloom Index for loadColumnRangesFromFiles in the tagLocation process, the existing method is to obtain the hoodieBaseFile by requesting the Driver side. When the amount of data is large and the parallelism is high, there is a certain network performance bottleneck, resulting in very slow tagloacation.
      However, hoodieBaseFile can be obtained directly through HoodieIndexUtils.getLatestBaseFilesForAllPartitions() in loadColumnRangesFromFiles(), so it can effectively improve the performance of TagLoaction of Bloom Index.

      Attachments

        Activity

          People

            jasonlee1017 Chuang Lee
            jasonlee1017 Chuang Lee
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: