Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6962

Correct the behavior of bulk insert for NB-CC

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.0.0-beta1
    • None

    Description

      How to handle the case if the multiple writer contains a job with bulk insert operation?
      1. Generated file group id: Generate a fixed file group ID because other jobs will use the fixed file group id suffix instead of random uuid suffix. The behavior needs to be consistent to prevent later writer jobs from writing the records with same primary key to different file groups.
      2.Deal with the transaction: The conflict resolution of bulk insert could not defer to the compaction phase. Because bulk insert writers flush data into base files, if there are multiple bulk insert job, there might exists multiple base files in the same bucket.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jingzhang Jing Zhang Assign to me
            jingzhang Jing Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment