Status: Open
Resolution: Unresolved
ExportSnapshot Jobs running for more than destination cluster hbase.master.hfilecleaner.ttl value, are filing with Can't find hfile: <hile> in the real or archive folders. Copied HFiles in archive folder is getting deleted at the Destination cluster by SnapshotHFileCleaner cleaner.
- Export snapshot moves archived hfiles files to destination archved folders.
- In progress ExportSnapshot manifest will be there in /hbase/.hbase-snapshot/.tmp till it is completed.
- in SnapshotHFileCleaner flow, where it is ignoring /hbase/.hbase-snapshot/.tmp directory to find the snapshot reference files,
private void refreshCache() throws IOException { // just list the snapshot directory directly, do not check the modification time for the root // snapshot directory, as some file system implementations do not modify the parent directory's // modTime when there are new sub items, for example, S3. FileStatus[] snapshotDirs = FSUtils.listStatus(fs, snapshotDir, p -> !p.getName().equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME));
- As in progress snapshot reference is missed by SnapshotHFileCleaner. TimeToLiveHFileCleaner marks the HFiles older(coped before hbase.master.hfilecleaner.ttl) than hbase.master.hfilecleaner.ttl to delete from in progress ExportSnapshots dir.
- This is causing ExportSnapshot to fail at the verification stage.
increase hbase.master.hfilecleaner.ttl value to more than the Snapshot ExportSnapshot job run time in the destination cluster.
I think this issue needs to be fixed in SnapshotHFileCleaner flow so that long-running ExportSnapshot jobs can succeed.