[HBASE-20226] Performance Improvement Taking Large Snapshots In Remote Filesystems - ASF JIRA

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.0-alpha-1, 2.3.0, 1.7.0
Fix Version/s: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
Component/s: snapshots
Labels:
- perfomance
Environment:

HBase 1.4.0 running on an AWS EMR cluster with the hbase.rootdir set to point to a folder in S3

Description

When taking a snapshot of any table, one of the last steps is to delete the region manifests, which have already been rolled up into a larger overall manifest and thus have redundant information.

This proposal is to do the deletion in a thread pool bounded by hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the current single threaded deletion is taking longer than all the rest of the snapshot tasks when the Hbase data and the snapshot folder are both in a remote filesystem like S3.

I have a patch for this proposal almost ready and will submit it tomorrow for feedback, although I haven't had a chance to write any tests yet.