Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20226

Performance Improvement Taking Large Snapshots In Remote Filesystems

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha-1, 2.3.0, 1.7.0
    • Fix Version/s: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
    • Component/s: snapshots
    • Labels:
    • Environment:

      HBase 1.4.0 running on an AWS EMR cluster with the hbase.rootdir set to point to a folder in S3 


      When taking a snapshot of any table, one of the last steps is to delete the region manifests, which have already been rolled up into a larger overall manifest and thus have redundant information.

      This proposal is to do the deletion in a thread pool bounded by hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the current single threaded deletion is taking longer than all the rest of the snapshot tasks when the Hbase data and the snapshot folder are both in a remote filesystem like S3.

      I have a patch for this proposal almost ready and will submit it tomorrow for feedback, although I haven't had a chance to write any tests yet.


        1. HBASE-20226..01.patch
          5 kB
          Saad Mufti

        Issue Links



            • Assignee:
              bharathv Bharath Vissapragada
              saadmufti Saad Mufti


              • Created:

                Issue deployment