Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20226

Performance Improvement Taking Large Snapshots In Remote Filesystems

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0-alpha-1, 2.3.0, 1.7.0
    • 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
    • snapshots
    • HBase 1.4.0 running on an AWS EMR cluster with the hbase.rootdir set to point to a folder in S3 

    Description

      When taking a snapshot of any table, one of the last steps is to delete the region manifests, which have already been rolled up into a larger overall manifest and thus have redundant information.

      This proposal is to do the deletion in a thread pool bounded by hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the current single threaded deletion is taking longer than all the rest of the snapshot tasks when the Hbase data and the snapshot folder are both in a remote filesystem like S3.

      I have a patch for this proposal almost ready and will submit it tomorrow for feedback, although I haven't had a chance to write any tests yet.

      Attachments

        1. HBASE-20226..01.patch
          5 kB
          Saad Mufti

        Issue Links

          Activity

            People

              bharathv Bharath Vissapragada
              saadmufti Saad Mufti
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: