Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28103

HBase backup repair stuck after failed delete due to missing S3 credentials

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • backup&restore
    • None

    Description

      I was experimenting what happens if a user were to execute `hbase backupe delete` without providing S3 credentials.

      I started with a backup present in a S3 bucket.

       

      hbase backup history
      
      {ID=backup_1695226626227,Type=FULL,Tables={foo:bar},State=COMPLETE,Start time=Wed Sep 20 16:17:09 UTC 2023,End time=Wed Sep 20 16:17:42 UTC 2023,Progress=100%}
      

      I tried to delete this without providing S3 credentials, it failed (as expected).

       

       

      hbase backup delete -l backup_1695226626227
      
      
      23/09/20 16:18:46 ERROR org.apache.hadoop.hbase.backup.impl.BackupAdminImpl: Delete operation failed, please run backup repair utility to restore backup system integrity
      java.nio.file.AccessDeniedException: s3a://backuprestore-experiments/hbase/backup_1695226626227: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
          at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:215)
          at org.apache.hadoop.fs.s3a.Invoker.onceInTheFuture(Invoker.java:190)
          at org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.next(Listing.java:651)
          at org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.requestNextBatch(Listing.java:430)
          at org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.<init>(Listing.java:372)
          at org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:143)
          at org.apache.hadoop.fs.s3a.Listing.getFileStatusesAssumingNonEmptyDir(Listing.java:264)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:3369)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$null$22(S3AFileSystem.java:3346)
          at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$23(S3AFileSystem.java:3345)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2480)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2499)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3344)
          at org.apache.hadoop.hbase.backup.util.BackupUtils.listStatus(BackupUtils.java:522)
          at org.apache.hadoop.hbase.backup.util.BackupUtils.cleanupHLogDir(BackupUtils.java:430)
          at org.apache.hadoop.hbase.backup.util.BackupUtils.cleanupBackupData(BackupUtils.java:411)
          at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.deleteBackup(BackupAdminImpl.java:229)
          at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.deleteBackups(BackupAdminImpl.java:142)
          at org.apache.hadoop.hbase.backup.impl.BackupCommands$DeleteCommand.executeDeleteListOfBackups(BackupCommands.java:627)
          at org.apache.hadoop.hbase.backup.impl.BackupCommands$DeleteCommand.execute(BackupCommands.java:578)
          at org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
          at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169)
          at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
          at org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177)
      Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
          at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:216)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
          at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
          at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
          at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
          at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6432)
          at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6404)
          at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5441)
          at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
          at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5397)
          at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$12(S3AFileSystem.java:2715)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
          at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
          at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:431)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2706)
          at org.apache.hadoop.fs.s3a.S3AFileSystem$ListingOperationCallbacksImpl.lambda$listObjectsAsync$0(S3AFileSystem.java:2342)
          at org.apache.hadoop.fs.s3a.impl.CallableSupplier.get(CallableSupplier.java:87)
          at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
          at com.amazonaws.auth.EnvironmentVariableCredentialsProvider.getCredentials(EnvironmentVariableCredentialsProvider.java:49)
          at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
          ... 28 more
      Delete command FAILED. Please run backup repair tool to restore backup system integrity
      23/09/20 16:18:46 ERROR org.apache.hadoop.hbase.backup.BackupDriver: Error running command-line tool
      java.nio.file.AccessDeniedException: s3a://backuprestore-experiments/hbase/backup_1695226626227: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
          at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:215)
          at org.apache.hadoop.fs.s3a.Invoker.onceInTheFuture(Invoker.java:190)
          at org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.next(Listing.java:651)
          at org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.requestNextBatch(Listing.java:430)
          at org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.<init>(Listing.java:372)
          at org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:143)
          at org.apache.hadoop.fs.s3a.Listing.getFileStatusesAssumingNonEmptyDir(Listing.java:264)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:3369)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$null$22(S3AFileSystem.java:3346)
          at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$23(S3AFileSystem.java:3345)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2480)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2499)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3344)
          at org.apache.hadoop.hbase.backup.util.BackupUtils.listStatus(BackupUtils.java:522)
          at org.apache.hadoop.hbase.backup.util.BackupUtils.cleanupHLogDir(BackupUtils.java:430)
          at org.apache.hadoop.hbase.backup.util.BackupUtils.cleanupBackupData(BackupUtils.java:411)
          at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.deleteBackup(BackupAdminImpl.java:229)
          at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.deleteBackups(BackupAdminImpl.java:142)
          at org.apache.hadoop.hbase.backup.impl.BackupCommands$DeleteCommand.executeDeleteListOfBackups(BackupCommands.java:627)
          at org.apache.hadoop.hbase.backup.impl.BackupCommands$DeleteCommand.execute(BackupCommands.java:578)
          at org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
          at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169)
          at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
          at org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177)
      Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
          at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:216)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
          at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
          at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
          at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
          at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
          at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6432)
          at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6404)
          at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5441)
          at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
          at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5397)
          at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$12(S3AFileSystem.java:2715)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
          at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
          at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
          at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:431)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2706)
          at org.apache.hadoop.fs.s3a.S3AFileSystem$ListingOperationCallbacksImpl.lambda$listObjectsAsync$0(S3AFileSystem.java:2342)
          at org.apache.hadoop.fs.s3a.impl.CallableSupplier.get(CallableSupplier.java:87)
          at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
          at com.amazonaws.auth.EnvironmentVariableCredentialsProvider.getCredentials(EnvironmentVariableCredentialsProvider.java:49)
          at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
          ... 28 more
      

      At this point, I cannot start a new backup because a failed delete command is present:

       

       

      hbase backup \
        -libjars /opt/hadoop/share/hadoop/tools/lib/hadoop-aws-3.3.6-1-lily.jar,/opt/hadoop/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.367.jar \
        -Dfs.s3a.access.key=... \
        -Dfs.s3a.secret.key=... \
        -Dfs.s3a.session.token=... \
         create incremental s3a://backuprestore-experiments/hbase -t foo:bar 
      
      Found failed backup DELETE coommand. 
      Backup system recovery is required.
      23/09/20 16:31:16 ERROR org.apache.hadoop.hbase.backup.BackupDriver: Error running command-line tool
      java.io.IOException: Failed backup DELETE found, aborted command execution
          at org.apache.hadoop.hbase.backup.impl.BackupCommands$Command.execute(BackupCommands.java:167)
          at org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:309)
          at org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
          at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169)
          at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
          at org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177)
      

      However, backup is unable to complete.

       

       

      hbase backup repair
      
      REPAIR status: no failed sessions found. Checking failed delete backup operation ...
      Found failed DELETE operation for: backup_1695226626227
      Running DELETE again ...
      23/09/20 16:34:13 WARN org.apache.hadoop.hbase.backup.impl.BackupSystemTable: Could not restore backup system table. Snapshot snapshot_backup_system does not exists.
      23/09/20 16:34:13 ERROR org.apache.hadoop.hbase.backup.BackupDriver: Error running command-line tool
      java.io.IOException: There is no active backup exclusive operation
          at org.apache.hadoop.hbase.backup.impl.BackupSystemTable.finishBackupExclusiveOperation(BackupSystemTable.java:645)
          at org.apache.hadoop.hbase.backup.impl.BackupCommands$RepairCommand.repairFailedBackupDeletionIfAny(BackupCommands.java:721)
          at org.apache.hadoop.hbase.backup.impl.BackupCommands$RepairCommand.execute(BackupCommands.java:681)
          at org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:134)
          at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:169)
          at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:199)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
          at org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:177)
      

      The core issue seems to be the assumption that there is a "backup exclusive operation" for each failed delete command.

      A good feature would also be to allow the repair command to delete the pending delete. Though I guess that in some cases that may not result in a reliable state if data was already partially deleted.

      The workaround in this case would be to delete the delete commands from the backup table I guess?

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            dieterdp_ng Dieter De Paepe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: