Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20226

Performance Improvement Taking Large Snapshots In Remote Filesystems

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0-alpha-1, 2.3.0, 1.7.0
    • 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
    • snapshots
    • HBase 1.4.0 running on an AWS EMR cluster with the hbase.rootdir set to point to a folder in S3 

    Description

      When taking a snapshot of any table, one of the last steps is to delete the region manifests, which have already been rolled up into a larger overall manifest and thus have redundant information.

      This proposal is to do the deletion in a thread pool bounded by hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the current single threaded deletion is taking longer than all the rest of the snapshot tasks when the Hbase data and the snapshot folder are both in a remote filesystem like S3.

      I have a patch for this proposal almost ready and will submit it tomorrow for feedback, although I haven't had a chance to write any tests yet.

      Attachments

        1. HBASE-20226..01.patch
          5 kB
          Saad Mufti

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bharathv Bharath Vissapragada
            saadmufti Saad Mufti
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment