Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
System must be tolerant to faults:
- Backup operations MUST be atomic (no partial completion state in the backup system table)
- Process must detect any type of failures which can result in a data loss (partial backup or partial restore)
- Proper system table state restore and cleanup must be done in case of a failure
- Additional utility to repair backup system table and corresponding file system cleanup must be implemented
Backup
General FT framework implementation
Before actual backup operation starts, snapshot of a backup system table is taken and system table is updated with ACTIVE_SNAPSHOT flag. The flag will be removed upon backup completion.
In case of any server-side failures, client catches errors/exceptions and handles them:
- Cleans up backup destination (removes partial backup data)
- Cleans up any temporary data
- Deletes any active snapshots of a tables being backed up (during full backup we snapshot tables)
- Restores backup system table from snapshot
- Deletes backup system table snapshot (we read snapshot name from backup system table before)
In case of any client-side failures:
Before any backup or restore operation run we check backup system table on ACTIVE_SNAPSHOT, if flag is present, operation aborts with a message that backup repair tool (see below) must be run
Backup repair tool
The command line tool backup repair which executes the following steps:
- Reads info of a last failed backup session
- Cleans up backup destination (removes partial backup data)
- Cleans up any temporary data
- Deletes any active snapshots of a tables being backed up (during full backup we snapshot tables)
- Restores backup system table from snapshot
- Deletes backup system table snapshot (we read snapshot name from backup system table before)
Detection of a partial loss of data
Full backup
Export snapshot operation .
We count files and check sizes before and after DistCp run
Incremental backup
Conversion of WAL to HFiles, when WAL file is moved from active to archive directory. The code is in place to handle this situation
During DistCp run (same as above)
Restore
This operation does not modify backup system table and is idempotent. No special FT is required.
Attachments
Attachments
Issue Links
- is part of
-
HBASE-14414 HBase Backup/Restore Phase 3
- Closed
- requires
-
HBASE-16465 Disable region splits and merges, balancer during full backup
- Open