Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      Reviewing the code, s3a has the problem raised in HADOOP-6688: deletion of a child entry during a recursive directory delete is propagated as an exception, rather than ignored as a detail which idempotent operations should just ignore.

      the exception should be caught and, if a file not found problem, logged rather than propagated

      1. HADOOP-11572-branch-2-003.patch
        8 kB
        Steve Loughran
      2. HADOOP-11572-branch-2-002.patch
        9 kB
        Steve Loughran
      3. HADOOP-11572-001.patch
        3 kB
        Steve Loughran

        Issue Links

          Activity

          Hide
          stevel@apache.org Steve Loughran added a comment -

          Sourabh Goyal thanks. Makes me think its a scale/throttling problem.

          The fact that a repeated request worked makes me think we could try to address this by recognising the internal error as something recoverable through retries,

          Show
          stevel@apache.org Steve Loughran added a comment - Sourabh Goyal thanks. Makes me think its a scale/throttling problem. The fact that a repeated request worked makes me think we could try to address this by recognising the internal error as something recoverable through retries,
          Hide
          sourabh912 Sourabh Goyal added a comment -

          Steve Loughran: We hit this issue recently in production. We applied the patch of this Jira and got the following stack trace:

          17/07/15 12:40:24 ERROR s3a.S3AFileSystem: Partial failure of delete, 1 errors
          com.amazonaws.services.s3.model.MultiObjectDeleteException: One or more objects could not be deleted (Service: null; Status Code: 200; Error Code: null; Request ID: xxxxxxx), S3 Extended Request ID: xxxxxxxxxxx
          at com.amazonaws.services.s3.AmazonS3Client.deleteObjects(AmazonS3Client.java:1785)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:775)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:750)
          at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:709)
          at org.apache.hadoop.fs.shell.MoveCommands$Rename.processPath(MoveCommands.java:110)
          at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:248)
          at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:328)
          at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:300)
          at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
          at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:282)
          at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:266)
          at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
          at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:211)
          at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
          at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
          at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
          17/07/15 12:40:24 ERROR s3a.S3AFileSystem: sourabhg/files5/part-10472-5f8b30f4-dc37-4419-9a7e-c1642ff5f0a1.parquet: "InternalError" - We encountered an internal error. Please try again.
          

          This is consistently reproducible if we try to delete ~30K or more files.
          As we can see that AWS gives InternalError. To fix this, we tried failed multi delete request again and it worked.

          Show
          sourabh912 Sourabh Goyal added a comment - Steve Loughran : We hit this issue recently in production. We applied the patch of this Jira and got the following stack trace: 17/07/15 12:40:24 ERROR s3a.S3AFileSystem: Partial failure of delete, 1 errors com.amazonaws.services.s3.model.MultiObjectDeleteException: One or more objects could not be deleted (Service: null ; Status Code: 200; Error Code: null ; Request ID: xxxxxxx), S3 Extended Request ID: xxxxxxxxxxx at com.amazonaws.services.s3.AmazonS3Client.deleteObjects(AmazonS3Client.java:1785) at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:775) at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:750) at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:709) at org.apache.hadoop.fs.shell.MoveCommands$Rename.processPath(MoveCommands.java:110) at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:248) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:328) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:300) at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:282) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:266) at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:211) at org.apache.hadoop.fs.shell.Command.run(Command.java:175) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) 17/07/15 12:40:24 ERROR s3a.S3AFileSystem: sourabhg/files5/part-10472-5f8b30f4-dc37-4419-9a7e-c1642ff5f0a1.parquet: "InternalError" - We encountered an internal error. Please try again. This is consistently reproducible if we try to delete ~30K or more files. As we can see that AWS gives InternalError . To fix this, we tried failed multi delete request again and it worked.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11750 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11750/)
          HADOOP-11572. s3a delete() operation fails during a concurrent delete of (stevel: rev ba70225cf6a1e7dc756f4991881de04f525ff088)

          • (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
          • (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
          • (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11750 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11750/ ) HADOOP-11572 . s3a delete() operation fails during a concurrent delete of (stevel: rev ba70225cf6a1e7dc756f4991881de04f525ff088) (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I'm closing this as "fixed", but we haven't really, we've just added logging of what's going on. As we collect more diagnostics from this, we can start to come up with some recovery where we distinguish the failure types

          Show
          stevel@apache.org Steve Loughran added a comment - I'm closing this as "fixed", but we haven't really, we've just added logging of what's going on. As we collect more diagnostics from this, we can start to come up with some recovery where we distinguish the failure types
          Hide
          liuml07 Mingliang Liu added a comment -

          Ah, that makes sense. I'm +1 on this. Thanks,

          Show
          liuml07 Mingliang Liu added a comment - Ah, that makes sense. I'm +1 on this. Thanks,
          Hide
          fabbri Aaron Fabbri added a comment -

          v3 patch looks good to me as well. Mingliang Liu I believe that test gets the exception because it points at the read-only bucket s3a://landsat-pds/scene_list.gz

          Show
          fabbri Aaron Fabbri added a comment - v3 patch looks good to me as well. Mingliang Liu I believe that test gets the exception because it points at the read-only bucket s3a://landsat-pds/scene_list.gz
          Hide
          fabbri Aaron Fabbri added a comment - - edited

          edit: removing comment.. I reviewed the old v1 patch.

          Show
          fabbri Aaron Fabbri added a comment - - edited edit: removing comment.. I reviewed the old v1 patch.
          Hide
          liuml07 Mingliang Liu added a comment -

          Steve this patch looks good to me. One quick question: how does testMultiObjectDeleteNoPermissions throw the expected MultiObjectDeleteException exception for test?

          Show
          liuml07 Mingliang Liu added a comment - Steve this patch looks good to me. One quick question: how does testMultiObjectDeleteNoPermissions throw the expected MultiObjectDeleteException exception for test?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Any reviewers for this?

          Show
          stevel@apache.org Steve Loughran added a comment - Any reviewers for this?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          patch 003, drops the import

          retested the test suite against s3 frankfurt; didn't run any of the other tests

          Show
          stevel@apache.org Steve Loughran added a comment - patch 003, drops the import retested the test suite against s3 frankfurt; didn't run any of the other tests
          Hide
          stevel@apache.org Steve Loughran added a comment -

          somehow got a ref to S3FileSystem import while coding the test. will fix

          Show
          stevel@apache.org Steve Loughran added a comment - somehow got a ref to S3FileSystem import while coding the test. will fix
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 40s branch-2 passed
          +1 compile 0m 17s branch-2 passed with JDK v1.8.0_121
          +1 compile 0m 19s branch-2 passed with JDK v1.7.0_121
          +1 checkstyle 0m 15s branch-2 passed
          +1 mvnsite 0m 27s branch-2 passed
          +1 mvneclipse 0m 14s branch-2 passed
          +1 findbugs 0m 35s branch-2 passed
          +1 javadoc 0m 13s branch-2 passed with JDK v1.8.0_121
          +1 javadoc 0m 15s branch-2 passed with JDK v1.7.0_121
          +1 mvninstall 0m 18s the patch passed
          +1 compile 0m 13s the patch passed with JDK v1.8.0_121
          -1 javac 0m 13s hadoop-tools_hadoop-aws-jdk1.8.0_121 with JDK v1.8.0_121 generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23)
          +1 compile 0m 16s the patch passed with JDK v1.7.0_121
          -1 javac 0m 16s hadoop-tools_hadoop-aws-jdk1.7.0_121 with JDK v1.7.0_121 generated 1 new + 29 unchanged - 0 fixed = 30 total (was 29)
          -0 checkstyle 0m 12s hadoop-tools/hadoop-aws: The patch generated 2 new + 3 unchanged - 0 fixed = 5 total (was 3)
          +1 mvnsite 0m 24s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 findbugs 0m 43s the patch passed
          +1 javadoc 0m 10s the patch passed with JDK v1.8.0_121
          +1 javadoc 0m 13s the patch passed with JDK v1.7.0_121
          +1 unit 0m 21s hadoop-aws in the patch passed with JDK v1.7.0_121.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          15m 10s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:b59b8b7
          JIRA Issue HADOOP-11572
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853970/HADOOP-11572-branch-2-002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d5561ab231a5 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2 / 171b186
          Default Java 1.7.0_121
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
          findbugs v3.0.0
          javac https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/artifact/patchprocess/diff-compile-javac-hadoop-tools_hadoop-aws-jdk1.8.0_121.txt
          javac https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/artifact/patchprocess/diff-compile-javac-hadoop-tools_hadoop-aws-jdk1.7.0_121.txt
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
          whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/artifact/patchprocess/whitespace-eol.txt
          JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 40s branch-2 passed +1 compile 0m 17s branch-2 passed with JDK v1.8.0_121 +1 compile 0m 19s branch-2 passed with JDK v1.7.0_121 +1 checkstyle 0m 15s branch-2 passed +1 mvnsite 0m 27s branch-2 passed +1 mvneclipse 0m 14s branch-2 passed +1 findbugs 0m 35s branch-2 passed +1 javadoc 0m 13s branch-2 passed with JDK v1.8.0_121 +1 javadoc 0m 15s branch-2 passed with JDK v1.7.0_121 +1 mvninstall 0m 18s the patch passed +1 compile 0m 13s the patch passed with JDK v1.8.0_121 -1 javac 0m 13s hadoop-tools_hadoop-aws-jdk1.8.0_121 with JDK v1.8.0_121 generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23) +1 compile 0m 16s the patch passed with JDK v1.7.0_121 -1 javac 0m 16s hadoop-tools_hadoop-aws-jdk1.7.0_121 with JDK v1.7.0_121 generated 1 new + 29 unchanged - 0 fixed = 30 total (was 29) -0 checkstyle 0m 12s hadoop-tools/hadoop-aws: The patch generated 2 new + 3 unchanged - 0 fixed = 5 total (was 3) +1 mvnsite 0m 24s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 0m 43s the patch passed +1 javadoc 0m 10s the patch passed with JDK v1.8.0_121 +1 javadoc 0m 13s the patch passed with JDK v1.7.0_121 +1 unit 0m 21s hadoop-aws in the patch passed with JDK v1.7.0_121. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 15m 10s Subsystem Report/Notes Docker Image:yetus/hadoop:b59b8b7 JIRA Issue HADOOP-11572 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12853970/HADOOP-11572-branch-2-002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d5561ab231a5 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2 / 171b186 Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 findbugs v3.0.0 javac https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/artifact/patchprocess/diff-compile-javac-hadoop-tools_hadoop-aws-jdk1.8.0_121.txt javac https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/artifact/patchprocess/diff-compile-javac-hadoop-tools_hadoop-aws-jdk1.7.0_121.txt checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/artifact/patchprocess/whitespace-eol.txt JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11688/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Testing: S3 frankfurt.

          Because I can't replicate the problem for missing files, I'm not attempting to be clever, just improving logging and tests for the current behaviour. The troubleshooting docs suggest actions, including "look at this JIRA". I'll change the description text to act as a landing pad for visitors

          Show
          stevel@apache.org Steve Loughran added a comment - Testing: S3 frankfurt. Because I can't replicate the problem for missing files, I'm not attempting to be clever, just improving logging and tests for the current behaviour. The troubleshooting docs suggest actions, including "look at this JIRA". I'll change the description text to act as a landing pad for visitors
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Patch 002

          • log the failing objects & error messages
          • document the issue
          • tests to show that this doesn't surface if an object is missing, but will if you lack permissions
          Show
          stevel@apache.org Steve Loughran added a comment - Patch 002 log the failing objects & error messages document the issue tests to show that this doesn't surface if an object is missing, but will if you lack permissions
          Hide
          stevel@apache.org Steve Loughran added a comment -

          The MultiObjectDeleteException doesn't take a string as part of its constructor, and I don't want to play subclass games with it. Instead I'll log more before the rethrow.

          This is what will be logged at error when the problem arises: a full stack trace and a list of every file + cause

          2017-02-22 14:43:15,243 [JUnit-testMultiObjectDeleteNoPermissions] DEBUG s3a.S3AFileSystem (S3AStorageStatistics.java:incrementCounter(60)) - object_delete_requests += 1  ->  1
          2017-02-22 14:43:15,430 [JUnit-testMultiObjectDeleteNoPermissions] ERROR s3a.S3AFileSystem (S3AFileSystem.java:deleteObjects(1022)) - Partial failure of delete, 1 errors
          com.amazonaws.services.s3.model.MultiObjectDeleteException: One or more objects could not be deleted (Service: null; Status Code: 200; Error Code: null; Request ID: 4F986C40502C1D94), S3 Extended Request ID: Ikdez7AG8gabOP8DQTFRwWy8gb7GmcvlMciUa/lAooBu9MOMMtvyBHcfm/Ls5C+sBEsnHuu10vU=
          	at com.amazonaws.services.s3.AmazonS3Client.deleteObjects(AmazonS3Client.java:2096)
          	at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1018)
          	at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1247)
          	at org.apache.hadoop.fs.s3a.ITestS3AFailureHandling.removeKeys(ITestS3AFailureHandling.java:159)
          	at org.apache.hadoop.fs.s3a.ITestS3AFailureHandling.testMultiObjectDeleteNoPermissions(ITestS3AFailureHandling.java:180)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:498)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
          	at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:19)
          	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
          	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
          	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
          	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          2017-02-22 14:43:15,433 [JUnit-testMultiObjectDeleteNoPermissions] ERROR s3a.S3AFileSystem (S3AFileSystem.java:deleteObjects(1024)) - scene_list.gz: "AccessDenied" - Access Denied
          
          Show
          stevel@apache.org Steve Loughran added a comment - The MultiObjectDeleteException doesn't take a string as part of its constructor, and I don't want to play subclass games with it. Instead I'll log more before the rethrow. This is what will be logged at error when the problem arises: a full stack trace and a list of every file + cause 2017-02-22 14:43:15,243 [JUnit-testMultiObjectDeleteNoPermissions] DEBUG s3a.S3AFileSystem (S3AStorageStatistics.java:incrementCounter(60)) - object_delete_requests += 1 -> 1 2017-02-22 14:43:15,430 [JUnit-testMultiObjectDeleteNoPermissions] ERROR s3a.S3AFileSystem (S3AFileSystem.java:deleteObjects(1022)) - Partial failure of delete, 1 errors com.amazonaws.services.s3.model.MultiObjectDeleteException: One or more objects could not be deleted (Service: null ; Status Code: 200; Error Code: null ; Request ID: 4F986C40502C1D94), S3 Extended Request ID: Ikdez7AG8gabOP8DQTFRwWy8gb7GmcvlMciUa/lAooBu9MOMMtvyBHcfm/Ls5C+sBEsnHuu10vU= at com.amazonaws.services.s3.AmazonS3Client.deleteObjects(AmazonS3Client.java:2096) at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1018) at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1247) at org.apache.hadoop.fs.s3a.ITestS3AFailureHandling.removeKeys(ITestS3AFailureHandling.java:159) at org.apache.hadoop.fs.s3a.ITestS3AFailureHandling.testMultiObjectDeleteNoPermissions(ITestS3AFailureHandling.java:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:19) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) 2017-02-22 14:43:15,433 [JUnit-testMultiObjectDeleteNoPermissions] ERROR s3a.S3AFileSystem (S3AFileSystem.java:deleteObjects(1024)) - scene_list.gz: "AccessDenied" - Access Denied
          Hide
          stevel@apache.org Steve Loughran added a comment -

          New test shows that you don't get any exception if the file isn't there. You will, however, see a failure if you don't have permissions to delete a path.

          What I'm going to do is catch and rethrow the exception with the details on the first specific failure included. That way, at least you get some hint of what's up. We can then ship that and see what surfaces in future. Also: add a section to troubleshooting including a pointer to this issue saying "go to this issue and follow the instructions". then we modify the instructions to say, initially, "add your stack"; once we have it fixed we can change the instructions.

          Show
          stevel@apache.org Steve Loughran added a comment - New test shows that you don't get any exception if the file isn't there. You will, however, see a failure if you don't have permissions to delete a path. What I'm going to do is catch and rethrow the exception with the details on the first specific failure included . That way, at least you get some hint of what's up. We can then ship that and see what surfaces in future. Also: add a section to troubleshooting including a pointer to this issue saying "go to this issue and follow the instructions". then we modify the instructions to say, initially, "add your stack"; once we have it fixed we can change the instructions.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          I now propose scanning the failed objects, seeing if they exist

          1. if the HEAD check fails: ignore
          2. if any object exists, the rejection is more serious as it could be permissions or something. Fail

          Make no attempt to retry, simply decide whether to ignore or reject.

          Show
          stevel@apache.org Steve Loughran added a comment - I now propose scanning the failed objects, seeing if they exist if the HEAD check fails: ignore if any object exists, the rejection is more serious as it could be permissions or something. Fail Make no attempt to retry, simply decide whether to ignore or reject.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Patch 001

          this is just a bit of code I started to write in HADOOP-13028; I've pulled out as it was getting too complex to work through within that patch.

          This code

          1. catches the multidelete exception
          2. extracts the list that failed
          3. and tries them as part of a one-by-one deletion.

          I've realised this isn't quite the right approach, as it assumes that the failures are transient and so that individual retries may work. Furthermore, there's no handling of failures in the one-by-one patch.

          A core way that a key delete can fail is for the file to be already deleted.

          Retrying doesn't help —and anyway, we don't need to retry, as the system is in the desired state.

          What is needed, then, is

          1. the failure cause of single delete or elements in a multi-delete be examined.
          2. if this is a not found failure: upgrade to success.
          3. if it is some other failure, we need to consider what to do? Something like a permissions failure probably merits uprating. What other failures can arise?

          It's critical we add tests for this, as that's the only way to understand what AWS S3 will really send back.

          My draft test plan here is

          1. make that removeKeys call protected/package private
          2. subclass s3a to one which overrides removeKeys(), and, prior to calling the superclass, deletes one or more of the keys from s3. This will trigger failures
          3. make sure the removekeys succeeds
          4. test with the FS.rename() and fs.delete() operations on both multikey and single key removal options.
            Alongside this: try to do a delete of a read only bucket, like the amazon landsat data.

          Like I said: more complex

          Show
          stevel@apache.org Steve Loughran added a comment - Patch 001 this is just a bit of code I started to write in HADOOP-13028 ; I've pulled out as it was getting too complex to work through within that patch. This code catches the multidelete exception extracts the list that failed and tries them as part of a one-by-one deletion. I've realised this isn't quite the right approach, as it assumes that the failures are transient and so that individual retries may work. Furthermore, there's no handling of failures in the one-by-one patch. A core way that a key delete can fail is for the file to be already deleted. Retrying doesn't help —and anyway, we don't need to retry, as the system is in the desired state. What is needed, then, is the failure cause of single delete or elements in a multi-delete be examined. if this is a not found failure: upgrade to success. if it is some other failure, we need to consider what to do? Something like a permissions failure probably merits uprating. What other failures can arise? It's critical we add tests for this, as that's the only way to understand what AWS S3 will really send back. My draft test plan here is make that removeKeys call protected/package private subclass s3a to one which overrides removeKeys(), and, prior to calling the superclass, deletes one or more of the keys from s3. This will trigger failures make sure the removekeys succeeds test with the FS.rename() and fs.delete() operations on both multikey and single key removal options. Alongside this: try to do a delete of a read only bucket, like the amazon landsat data. Like I said: more complex
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Abhishek: have you had a chance to look at this?

          Show
          stevel@apache.org Steve Loughran added a comment - Abhishek: have you had a chance to look at this?
          Hide
          abmodi Abhishek Modi added a comment -

          I would like to work on it. I have already gone through code and will update the status soon. I have aws credentials set up so I can run tests too.

          Show
          abmodi Abhishek Modi added a comment - I would like to work on it. I have already gone through code and will update the status soon. I have aws credentials set up so I can run tests too.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Removing Fix-version. Please use Target-version for the intended release and let committers set the fix-version at commit time.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Removing Fix-version. Please use Target-version for the intended release and let committers set the fix-version at commit time.

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              stevel@apache.org Steve Loughran
            • Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development