Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha-1
-
None
-
Reviewed
-
* Remove dependence on storing WAL filenames for backup in backup:system meta table
Description
Context:
Currently WAL logs are stored in `backup:system` meta table
// code placeholder wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175 wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, timestamp=1622003479895, value=backup_1622003358258 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, timestamp=1622003479895, value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280 wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1
Also, Every backup (Incremental and Full) performs a log roll just before taking backup and stores what was the timestamp at which log roll was performed per regionserver per backup using following format.
// code placeholder rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 column=meta:rs-log-ts, timestamp=1622887363275, value=\x00\x00\x01y\xDB\x81\x85
There are 2 cases for which WAL log refrences stored in `backup:system` and are being used.
Use Case 1.
Existing Design: To cleanup WAL's for which backup is already taken using `BackupLogCleaner`. Which uses this references to clean up backed up logs.
New Design:
Since log roll timestamp is stored as part of backup per regionserver. We can check all previous successfull backup's and then identify which logs are to be retained and which ones are to be cleaned up as follows
- Identify which are the latest successful backups performed per table.
- Per backup identified above, identify what is the oldest log rolled timestamp perfomed per regionserver per table.
- All those WAL's which are older than oldest log rolled timestamp perfomed for any table backed can be removed by `BackupLogCleaner`
Use Case 2.
Existing Design: During incremental backup, to check system table if there are any duplicate WAL's for which backup is taken again.
New Design:
- Incremental backup already identifies which all WAL's to be backed up using `rslogts:` mentioned above.
- Additionally it checks `wals:` to ensure no logs are backuped for second time. And this is redundant and not seen any extra benefit.
Attachments
Issue Links
- is a child of
-
HBASE-25784 Support for Parallel Backups enabling multi tenancy with rsgroups
- In Progress
- links to