Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27825

Store file archiving is not sufficiently indirected through the store file tracker

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.5.4
    • None
    • None
    • None

    Description

      GCRegionProcedure fails cleaning up old parents after splits. We time out “renaming” files into the archive. On S3, a rename operation is a whole file copy operation. It can take a long time to copy a large hfile.

      [PEWorker-21] backup.HFileArchiver: Failed to archive FileablePath, s3a://[...]
      java.net.SocketTimeoutException: copyFile(data/default/cluster_test/[...], 
      archive/data/default/cluster_test/[...]) on data/default/cluster_test/[...]: 
      com.amazonaws.SdkClientException: Unable to execute HTTP request: Read timed out
      

      Once we fail to “rename” the files into the archive we continue to fail because renames on S3 are not atomic. They are an object copy operation which is neither atomic nor automatically rolled back. The incomplete object remains present. The GCRegionProcedure can never complete successfully.

      org.apache.hadoop.fs.FileAlreadyExistsException: Failed to rename s3a://[...] to s3a://[...]; 
      destination file exists
      
      org.apache.hadoop.hbase.backup.FailedArchiveException: Failed to archive/delete all the files for region:
      ddaf1fb41197254483dcfd1d63e869d0 into s3a://[...]. 
      Something is probably awry on the filesystem.
      

      Short term mitigations:

      In HFileArchiver#resolveAndArchiveFile, if moveAndClose of the current file fails, attempt to delete the incomplete archive side file.

      Also set the recommended default read timeout for S3A to a larger value.

      Long term:

      When the file based store file tracker is enabled, the archived files for a store should not longer be moved to a separate path from the live files in the store. Instead whether or not the file is archived or not should be a status bit maintained in the tracker manifest.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: