Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27404

Long running ExportSnapshot fails with Can't find hfile Exception.



    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • snapshots
    • None


      ExportSnapshot Jobs running for more than destination cluster hbase.master.hfilecleaner.ttl value, are filing with Can't find hfile: <hile> in the real or archive folders. Copied HFiles in archive folder is getting deleted at the Destination cluster by SnapshotHFileCleaner cleaner.


      1. Export snapshot moves archived hfiles files to destination archved folders.
      2. In progress ExportSnapshot manifest will be there in /hbase/.hbase-snapshot/.tmp till it is completed.
      3. in SnapshotHFileCleaner flow, where it is ignoring /hbase/.hbase-snapshot/.tmp directory to find the snapshot reference files,
      private void refreshCache() throws IOException {
        // just list the snapshot directory directly, do not check the modification time for the root
        // snapshot directory, as some file system implementations do not modify the parent directory's
        // modTime when there are new sub items, for example, S3.
        FileStatus[] snapshotDirs = FSUtils.listStatus(fs, snapshotDir,
          p -> !p.getName().equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)); 
      1. As in progress snapshot reference is missed by SnapshotHFileCleaner. TimeToLiveHFileCleaner marks the HFiles older(coped before hbase.master.hfilecleaner.ttl) than hbase.master.hfilecleaner.ttl to delete from in progress ExportSnapshots dir.
      2. This is causing ExportSnapshot to fail at the verification stage.



      increase hbase.master.hfilecleaner.ttl value to more than the Snapshot ExportSnapshot job run time in the destination cluster.


      I think this issue needs to be fixed in SnapshotHFileCleaner flow so that long-running ExportSnapshot jobs can succeed.




            Unassigned Unassigned
            hema.sunnapu HemaKumar
            0 Vote for this issue
            2 Start watching this issue