Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7111

Performance regression of spark job which written into simple bucket index table

    XMLWordPrintableJSON

Details

    Description

      After upgrade the version to 0.14.0, the performance of the Spark job, which is written into a simple bucket index table, is regressing.

      The reason is in the PR#4480, the refactor of bucket index introduce two unnecessary stages in tag for simple bucket index.

          List<String> partitions = records.map(HoodieRecord::getPartitionPath).distinct().collectAsList();
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jingzhang Jing Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: