Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20226

Performance Improvement Taking Large Snapshots In Remote Filesystems

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.4.0
    • Fix Version/s: None
    • Component/s: snapshots
    • Labels:
      None
    • Environment:

      HBase 1.4.0 running on an AWS EMR cluster with the hbase.rootdir set to point to a folder in S3 

      Description

      When taking a snapshot of any table, one of the last steps is to delete the region manifests, which have already been rolled up into a larger overall manifest and thus have redundant information.

      This proposal is to do the deletion in a thread pool bounded by hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the current single threaded deletion is taking longer than all the rest of the snapshot tasks when the Hbase data and the snapshot folder are both in a remote filesystem like S3.

      I have a patch for this proposal almost ready and will submit it tomorrow for feedback, although I haven't had a chance to write any tests yet.

        Attachments

        1. HBASE-20226..01.patch
          5 kB
          Saad Mufti

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                saadmufti Saad Mufti
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: