[HBASE-14030] HBase Backup/Restore Phase 1 - ASF JIRA

Details

Type: Umbrella
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: HBASE-7912
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
This experimental feature allows to perform backup/restore operations, including incremental ones, on a set of HBase tables.

Key features and Use Cases

A common practice of backup and restore in database is to first take full baseline backup, and
then periodically take incremental backup that capture the changes since the full baseline
backup. HBase cluster can store massive amount data. Therefore we want use full backup in
combination with incremental backups for HBase as well.
The following is a typical use case scenario for full and incremental backup:

● The user takes a full backup of a table or a set of tables in HBase.
● The user schedules periodical incremental backups to capture the changes from the full
backup, or from last incremental backup.
● The user needs to restore table data to a past point in time.
● The full backup is restored to the table(s) or to different table name(s). Then the
incremental backups that are up to the desired point in time are applied on top of the full
backup.
We would support the following key features and capabilities.
● Backup to DFS FileSystem across clusters and possibly to other storage media or
servers.
● Support single table or a set of tables backup and restore (full and incremental).
● Restore to different table names and to different clusters.
● Support adding and removing tables to and from backup set without interruption of
incremental backup schedule.
● Support merge of incremental backups into longer period and bigger incremental
backups for easy storage and restore.
● Support scheduled backups.
● Unified command line interface for all the above.

To illustrate these key capabilities, the following are two more detailed use case examples.

Use case example 1:

1. User takes a full backup of a set of tables (i.e. table1 and table2) in HBase.
2. User takes incremental backups. The incremental backup will only track table1 and
table2.
3. User adds other tables (i.e. table3 and table4) in HBase, and an implicit full backup is
executed during the add process
4. User continues to take incremental backups. The incremental backup data would cover
table1, table2, table3 and table4.
5. User wants to restore table3 and table4 to a past PIT (point-in-time).
6. Full backup in 3. is restored onto HBase cluster. Then the incremental backups after that
full backup are applied on top of the full restore until the PIT.

Use case example 2:

1. User takes a full backup of a set of tables in HBase.
2. User takes daily incremental backups.
3. User merges the daily incremental backups into weekly incremental backups.
4. User combines/rolls up the weekly incremental backup into monthly incremental
backups.
5. User wants to restore the tables to a past PIT.
6. Full backup is restored onto HBase cluster.
7. Monthly incremental backups before the desired PIT are applied.
8. Closest daily backups up to the PIT are applied.

To create full backup:

HBASE_DIR/bin/hbase backup create full <backup_root_path> [tables]

backup_root_path - path to backup root directory (file://, hdfs:// or any other Hadoop-compatible path)
tables - list of tables, comma-separated. If no tables specified then all tables will be saved.

To create full backup:

HBASE_DIR/bin/hbase backup create incremental <backup_root_path> [tables]

backup_root_path - path to backup root directory (file://, hdfs:// or any other Hadoop-compatible path)
tables - list of tables, comma-separated. If no tables specified then all tables will be saved.

To restore table(s):

HBASE_DIR/bin/hbase backup restore <backup_root_path> <backup_id> [tables]

backup_root_path - path to backup root directory (file://, hdfs:// or any other Hadoop-compatible path)
backup_id - The id identifying the backup image.
tables - list of tables, comma-separated.

FOR EXPERIENCED USERS only:

To get list of backup ids you will need to scan hbase:backup table using hbase shell or other means.

Show
This experimental feature allows to perform backup/restore operations, including incremental ones, on a set of HBase tables. Key features and Use Cases A common practice of backup and restore in database is to first take full baseline backup, and then periodically take incremental backup that capture the changes since the full baseline backup. HBase cluster can store massive amount data. Therefore we want use full backup in combination with incremental backups for HBase as well. The following is a typical use case scenario for full and incremental backup: ● The user takes a full backup of a table or a set of tables in HBase. ● The user schedules periodical incremental backups to capture the changes from the full backup, or from last incremental backup. ● The user needs to restore table data to a past point in time. ● The full backup is restored to the table(s) or to different table name(s). Then the incremental backups that are up to the desired point in time are applied on top of the full backup. We would support the following key features and capabilities. ● Backup to DFS FileSystem across clusters and possibly to other storage media or servers. ● Support single table or a set of tables backup and restore (full and incremental). ● Restore to different table names and to different clusters. ● Support adding and removing tables to and from backup set without interruption of incremental backup schedule. ● Support merge of incremental backups into longer period and bigger incremental backups for easy storage and restore. ● Support scheduled backups. ● Unified command line interface for all the above. To illustrate these key capabilities, the following are two more detailed use case examples. Use case example 1: 1. User takes a full backup of a set of tables (i.e. table1 and table2) in HBase. 2. User takes incremental backups. The incremental backup will only track table1 and table2. 3. User adds other tables (i.e. table3 and table4) in HBase, and an implicit full backup is executed during the add process 4. User continues to take incremental backups. The incremental backup data would cover table1, table2, table3 and table4. 5. User wants to restore table3 and table4 to a past PIT (point-in-time). 6. Full backup in 3. is restored onto HBase cluster. Then the incremental backups after that full backup are applied on top of the full restore until the PIT. Use case example 2: 1. User takes a full backup of a set of tables in HBase. 2. User takes daily incremental backups. 3. User merges the daily incremental backups into weekly incremental backups. 4. User combines/rolls up the weekly incremental backup into monthly incremental backups. 5. User wants to restore the tables to a past PIT. 6. Full backup is restored onto HBase cluster. 7. Monthly incremental backups before the desired PIT are applied. 8. Closest daily backups up to the PIT are applied. To create full backup: HBASE_DIR/bin/hbase backup create full <backup_root_path> [tables] backup_root_path - path to backup root directory ( file://, hdfs:// or any other Hadoop-compatible path) tables - list of tables, comma-separated. If no tables specified then all tables will be saved. To create full backup: HBASE_DIR/bin/hbase backup create incremental <backup_root_path> [tables] backup_root_path - path to backup root directory ( file://, hdfs:// or any other Hadoop-compatible path) tables - list of tables, comma-separated. If no tables specified then all tables will be saved. To restore table(s): HBASE_DIR/bin/hbase backup restore <backup_root_path> <backup_id> [tables] backup_root_path - path to backup root directory ( file://, hdfs:// or any other Hadoop-compatible path) backup_id - The id identifying the backup image. tables - list of tables, comma-separated. FOR EXPERIENCED USERS only: To get list of backup ids you will need to scan hbase:backup table using hbase shell or other means.

Description

This is the umbrella ticket for Backup/Restore Phase 1. See ~~HBASE-7912~~ design doc for the phase description.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hbase-14030_v36.patch
29/Feb/16 23:27
712 kB
Enis Soztutar
HBASE-14030.v38.patch
07/Mar/16 19:39
712 kB
Ted Yu
HBASE-14030.v39.patch
11/Mar/16 00:12
712 kB
Ted Yu
HBASE-14030.v40.patch
14/Mar/16 18:39
715 kB
Ted Yu
HBASE-14030.v41.patch
16/Mar/16 17:34
717 kB
Ted Yu
HBASE-14030.v42.patch
16/Mar/16 21:07
716 kB
Ted Yu
HBASE-14030.v43.patch
22/Mar/16 02:24
716 kB
Ted Yu
HBASE-14030-v0.patch
07/Jul/15 20:23
296 kB
Vladimir Rodionov
HBASE-14030-v1.patch
13/Jul/15 23:45
313 kB
Vladimir Rodionov
HBASE-14030-v10.patch
24/Sep/15 00:44
356 kB
Vladimir Rodionov
HBASE-14030-v11.patch
24/Sep/15 20:02
357 kB
Vladimir Rodionov
HBASE-14030-v12.patch
25/Sep/15 18:43
359 kB
Vladimir Rodionov
HBASE-14030-v13.patch
14/Oct/15 21:18
361 kB
Vladimir Rodionov
HBASE-14030-v14.patch
15/Oct/15 23:32
361 kB
Vladimir Rodionov
HBASE-14030-v15.patch
03/Nov/15 21:43
361 kB
Vladimir Rodionov
HBASE-14030-v17.patch
20/Nov/15 01:15
363 kB
Vladimir Rodionov
HBASE-14030-v18.patch
20/Nov/15 07:15
361 kB
Vladimir Rodionov
HBASE-14030-v2.patch
14/Jul/15 01:13
327 kB
Vladimir Rodionov
HBASE-14030-v20.patch
15/Dec/15 23:49
404 kB
Vladimir Rodionov
HBASE-14030-v21.patch
21/Dec/15 20:53
429 kB
Vladimir Rodionov
HBASE-14030-v22.patch
22/Dec/15 01:45
418 kB
Vladimir Rodionov
HBASE-14030-v23.patch
22/Dec/15 21:34
416 kB
Vladimir Rodionov
HBASE-14030-v24.patch
23/Dec/15 02:13
416 kB
Vladimir Rodionov
HBASE-14030-v25.patch
29/Dec/15 00:20
373 kB
Vladimir Rodionov
HBASE-14030-v26.patch
29/Dec/15 22:04
373 kB
Vladimir Rodionov
HBASE-14030-v27.patch
07/Jan/16 23:09
438 kB
Vladimir Rodionov
HBASE-14030-v28.patch
08/Jan/16 05:10
438 kB
Vladimir Rodionov
HBASE-14030-v3.patch
14/Jul/15 22:19
336 kB
Vladimir Rodionov
HBASE-14030-v30.patch
19/Jan/16 18:32
581 kB
Vladimir Rodionov
HBASE-14030-v35.patch
16/Feb/16 21:12
826 kB
Vladimir Rodionov
HBASE-14030-v37.patch
04/Mar/16 03:49
712 kB
Vladimir Rodionov
HBASE-14030-v4.patch
17/Jul/15 22:24
362 kB
Vladimir Rodionov
HBASE-14030-v5.patch
20/Jul/15 23:20
361 kB
Vladimir Rodionov
HBASE-14030-v6.patch
21/Jul/15 18:37
361 kB
Vladimir Rodionov
HBASE-14030-v7.patch
22/Sep/15 21:39
363 kB
Vladimir Rodionov
HBASE-14030-v8.patch
22/Sep/15 22:49
363 kB
Vladimir Rodionov

Issue Links

incorporates

HBASE-14037 Deletion of a table from backup set results int RTE during next backup

Closed

HBASE-14038 Incremental backup list set is ignored during backup

Closed

HBASE-14039 BackupHandler.deleteSnapshot MUST use HBase Snapshot API

Closed

HBASE-14040 Small refactoring in BackupHandler

Closed

HBASE-15411 Rewrite backup with Procedure V2 - phase 1

Closed

HBASE-14031 HBase Backup/Restore Phase 1: Abstract DistCp in incremental backup

Closed

HBASE-14032 HBase Backup/Restore Phase 1: Abstract SnapshotCopy (full backup)

Closed

HBASE-14033 HBase Backup/Restore Phase1: Abstract WALPlayer (incremental restore)

Closed

HBASE-14034 HBase Backup/Restore Phase 1: Abstract Coordination manager (Zk) operations

Closed

HBASE-14035 HBase Backup/Restore Phase 1: hbase:backup - backup system table

Closed

HBASE-14036 HBase Backup/Restore Phase 1: Custom WAL archive cleaner

Closed

is part of

HBASE-7912 HBase Backup/Restore Based on HBase Snapshot

Closed

is required by

HBASE-14414 HBase Backup/Restore Phase 3

Closed

HBASE-14123 HBase Backup/Restore Phase 2

Closed

(6 incorporates, 1 is part of, 2 is required by)

Sub-Tasks

1.	Use protobuf for serialization/deserialization.		Closed	Vladimir Rodionov
2.	Selection of WAL files eligible for incremental backup is broken		Closed	Vladimir Rodionov

HBase Backup/Restore Phase 1

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates