Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7578

Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      After upgrade a hudi table from 0.10 version to 0.14 version, the compaction job become much more slower.
      The hudi table is a MOR table without partition field. And the hudi table does not do any schema evolution.

      The compaction job would finished in 52 minutes using 0.14 version. But the compaction job would finished in 25 minutes using 0.10 version.

      And in the 0.14 version, the task jstack become much more complex. Including the following content:

      After compare 0.14 and 0.10 version, we found there is a difference when copy the old record from old base file to new base file.
      In 0.14 version, the cost is much more heavy.

       

      In 0.10 version, the copy is more simple.

       

      Rewriting all fields value of each old record is not necessary, update new file path value and metadata fields are enough.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            jingzhang Jing Zhang
            Danny Chen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprint:
                Sprint 2024-03-25 ended 26/Apr/24
                View on Board

                Slack

                  Issue deployment