This experimental feature allows to perform backup/restore operations, including incremental ones, on a set of HBase tables.
Key features and Use Cases
A common practice of backup and restore in database is to first take full baseline backup, and
then periodically take incremental backup that capture the changes since the full baseline
backup. HBase cluster can store massive amount data. Therefore we want use full backup in
combination with incremental backups for HBase as well.
The following is a typical use case scenario for full and incremental backup:
● The user takes a full backup of a table or a set of tables in HBase.
● The user schedules periodical incremental backups to capture the changes from the full
backup, or from last incremental backup.
● The user needs to restore table data to a past point in time.
● The full backup is restored to the table(s) or to different table name(s). Then the
incremental backups that are up to the desired point in time are applied on top of the full
backup.
We would support the following key features and capabilities.
● Backup to DFS FileSystem across clusters and possibly to other storage media or
servers.
● Support single table or a set of tables backup and restore (full and incremental).
● Restore to different table names and to different clusters.
● Support adding and removing tables to and from backup set without interruption of
incremental backup schedule.
● Support merge of incremental backups into longer period and bigger incremental
backups for easy storage and restore.
● Support scheduled backups.
● Unified command line interface for all the above.
To illustrate these key capabilities, the following are two more detailed use case examples.
Use case example 1:
1. User takes a full backup of a set of tables (i.e. table1 and table2) in HBase.
2. User takes incremental backups. The incremental backup will only track table1 and
table2.
3. User adds other tables (i.e. table3 and table4) in HBase, and an implicit full backup is
executed during the add process
4. User continues to take incremental backups. The incremental backup data would cover
table1, table2, table3 and table4.
5. User wants to restore table3 and table4 to a past PIT (point-in-time).
6. Full backup in 3. is restored onto HBase cluster. Then the incremental backups after that
full backup are applied on top of the full restore until the PIT.
Use case example 2:
1. User takes a full backup of a set of tables in HBase.
2. User takes daily incremental backups.
3. User merges the daily incremental backups into weekly incremental backups.
4. User combines/rolls up the weekly incremental backup into monthly incremental
backups.
5. User wants to restore the tables to a past PIT.
6. Full backup is restored onto HBase cluster.
7. Monthly incremental backups before the desired PIT are applied.
8. Closest daily backups up to the PIT are applied.
To create full backup:
HBASE_DIR/bin/hbase backup create full <backup_root_path> [tables]
backup_root_path - path to backup root directory (
file://, hdfs:// or any other Hadoop-compatible path)
tables - list of tables, comma-separated. If no tables specified then all tables will be saved.
To create full backup:
HBASE_DIR/bin/hbase backup create incremental <backup_root_path> [tables]
backup_root_path - path to backup root directory (
file://, hdfs:// or any other Hadoop-compatible path)
tables - list of tables, comma-separated. If no tables specified then all tables will be saved.
To restore table(s):
HBASE_DIR/bin/hbase backup restore <backup_root_path> <backup_id> [tables]
backup_root_path - path to backup root directory (
file://, hdfs:// or any other Hadoop-compatible path)
backup_id - The id identifying the backup image.
tables - list of tables, comma-separated.
FOR EXPERIENCED USERS only:
To get list of backup ids you will need to scan hbase:backup table using hbase shell or other means.