We have been working on the ability to do backups in HBase with different levels of protection. This is an umbrella task for all the backup related changes. Here are some kinds of changes - will create separate issues for them:
Roughly here are a few flavors of backups giving increasing levels of guarentees:
1. Per cf backups
2. Multi-cf backups with row atomicity preserved
3. Multi-cf backups with row atomicity and point in time recovery.
On the perf dimension, here is a list of improvements:
1. Copy the files - regular hadoop "cp"
2. Use fast copy - copy blocks and stitch them together, saves top of rack bandwidth
3. Use fast copy with hard links - no file copy, it does only ext3 level linking.
On the durability of data side:
1. Ability to backup data onto the same racks as those running HBase
2. Intra-datacenter backup
3. Inter datacenter backup
1. Restore with a table name different from the backed up table name
2. Restore a backed up table wen HBase cluster is not running at restore time
3. Restore into a live and running cluster
1. How to setup backups in live cluster
2. Setting up intra-DC
3. cross-DC backups
4. Verifying a backup is good