Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21642

CopyTable by reading snapshot and bulkloading will save a lot of time.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.2.0
    • None
    • None
    • Reviewed

    Description

      In our HBase clusters, some users has the need to merge two diff table's data into one. Currently , the CopyTable will scan the source table , and put mutations into destination table.
      Although CopyTable with bulkload can speed a lot (compared to CopyTable with scan and put), it still take lots of time to scan the source table. and the worst thing is: CopyTable with scan table will impact the cluster's availablity, it cost lots of resource in RS to scanning, the cpu, memory, gc stw, rs handlers, disk io, network io ... etc. All those things will affect the availablity.

      So in our clusters, we tried to do all scanning job by using scan snapshot instead of scan table. it at least isolate the cpu & memory & gc resource between the online RS and scanning job. What's more, the snapshot scanning is much faster than scaning RS, and it's more stable.

      So, here, I'll make the copy table tool support snapshot scanning.

      Attachments

        1. HBASE-21642.v1.patch
          20 kB
          Zheng Hu

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            openinx Zheng Hu
            openinx Zheng Hu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment