Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7578

Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance

    XMLWordPrintableJSON

Details

    Description

      After upgrade a hudi table from 0.10 version to 0.14 version, the compaction job become much more slower.
      The hudi table is a MOR table without partition field. And the hudi table does not do any schema evolution.

      The compaction job would finished in 52 minutes using 0.14 version. But the compaction job would finished in 25 minutes using 0.10 version.

      And in the 0.14 version, the task jstack become much more complex. Including the following content:

      After compare 0.14 and 0.10 version, we found there is a difference when copy the old record from old base file to new base file.
      In 0.14 version, the cost is much more heavy.

       

      In 0.10 version, the copy is more simple.

       

      Rewriting all fields value of each old record is not necessary, update new file path value and metadata fields are enough.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jingzhang Jing Zhang
            Danny Chen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: