Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25281

Bulkload split hfile too many times due to unreasonable split point

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.4.0
    • tooling
    • None

    Description

      https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/BulkLoadHFilesTool.java#L688

      if hfile span multi regions, for example A,B,C,D,E,F(the start key of these regions are in ascending order), we should use region C endkey to split, not region A. In this way, we can get equal .top and .bottom hfiles, reduce time complexity of split from O( n ) to O(logn),decrease invoke of bulkLoad rpc to regionserver ,and also avoid write amplification during copyHFileHalf

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            niuyulin Yulin Niu Assign to me
            niuyulin Yulin Niu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment