Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-712

Improve exporter performance and memory usage

    XMLWordPrintableJSON

Details

    Description

      https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L103-L107

      The way the data file list for export is collected can be improved due to

      • not parallelized among partitions
      • the list can be too large
      • listing partition to get the latest files requires scanning all files (RFC-15 could solve this)

       

      Attachments

        Issue Links

          Activity

            People

              xushiyan Raymond Xu
              rxu Raymond Xu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: