Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
We encountered an issue with the loss of HFile references in the snapshot after enabling the CompactionServer feature in our cluster.
The relevant log fragment is approximately as follows:
File does not exist: /.../.tmp/data/default/......../b72cab4efb074defb1bd9acd9087891f File does not exist: /....../archive/data/default/........../b72cab4efb074defb1bd9acd9087891f File does not exist: /....../data/default/......./b72cab4efb074defb1bd9acd9087891f
From the displayed HDFS logs, we observed that this HFile 'b72cab4efb074defb1bd9acd9087891f' was renamed by the CompactionServer and RegionServer, and eventually deleted by the HMaster.
2022-07-13,00:50:01,727 INFO FSNamesystem.audit: cmd=rename operator=CompactionServer src=/...../data/default/...../b72cab4efb074defb1bd9acd9087891f dst=/....../archive/data/default/....../b72cab4efb074defb1bd9acd9087891f 2022-07-13,00:51:23,802 INFO FSNamesystem.audit: cmd=rename operator=RegionServer src=/....../archive/data/default/....../b72cab4efb074defb1bd9acd9087891f dst=/....../archive/data/default/....../b72cab4efb074defb1bd9acd9087891f.1657644683801 2022-07-13,01:51:57,823 INFO FSNamesystem.audit: cmd=delete operator=HMaster src=/....../archive/data/default/....../b72cab4efb074defb1bd9acd9087891f.1657644683801
Based on HBASE-26722 and HBASE-22163, we understand that if a region A on RS1 is not closed, and another RS2 (in this case, CompactionServer) opens the same region A, it may trigger an "archived" state. Consequently, when RS1 closes this region, it will be archived again, resulting in the deletion of the HFile from the archived directory. As a result, the snapshot will lose its reference to the HFile.
Attachments
Issue Links
- relates to
-
HBASE-22163 Should not archive the compacted store files when region warmup
- Resolved
-
HBASE-26722 Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver
- Resolved