Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.2.0, 3.3.0
    • fs/s3
    • None

    Description

      When S3Guard can't update the DB and so throws an IOE after the retry limit is exceeded, it's not at all informative. Improve logging & exception

      Attachments

        1. failure.log
          58 kB
          Steve Loughran
        2. HADOOP-15349.001.patch
          1 kB
          Gabor Bota

        Issue Links

          Activity

            stevel@apache.org Steve Loughran added a comment -

            Stack

            2018-03-28 04:22:17,375 [s3-committer-pool-2] ERROR s3a.S3AFileSystem (S3AFileSystem.java:finishedWrite(2730)) - S3Guard: Error updating MetadataStore for write to cloud-integration/DELAY_LISTING_ME/S3ACommitBulkDataSuite/bulkdata/output/landsat/parquet/parted-1/year=2016/month=6/part-00000-24152aa2-c86d-49d2-98d4-820dc37a6df1-local-1522235507089.c000.snappy.parquet:
            java.io.IOException: Max retries exceeded (9) for DynamoDB
            	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:657)
            	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:636)
            	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:695)
            	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:685)
            	at org.apache.hadoop.fs.s3a.s3guard.S3Guard.putAndReturn(S3Guard.java:149)
            	at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2727)
            	at org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$finalizeMultipartUpload$1(WriteOperationHelper.java:234)
            	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
            	at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
            	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
            	at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
            	at org.apache.hadoop.fs.s3a.WriteOperationHelper.finalizeMultipartUpload(WriteOperationHelper.java:222)
            	at org.apache.hadoop.fs.s3a.WriteOperationHelper.completeMPUwithRetries(WriteOperationHelper.java:267)
            	at org.apache.hadoop.fs.s3a.commit.CommitOperations.innerCommit(CommitOperations.java:179)
            	at org.apache.hadoop.fs.s3a.commit.CommitOperations.commit(CommitOperations.java:151)
            	at org.apache.hadoop.fs.s3a.commit.CommitOperations.commitOrFail(CommitOperations.java:134)
            	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.lambda$commitPendingUploads$3(AbstractS3ACommitter.java:451)
            	at org.apache.hadoop.fs.s3a.commit.Tasks$Builder$1.run(Tasks.java:254)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            
            stevel@apache.org Steve Loughran added a comment - Stack 2018-03-28 04:22:17,375 [s3-committer-pool-2] ERROR s3a.S3AFileSystem (S3AFileSystem.java:finishedWrite(2730)) - S3Guard: Error updating MetadataStore for write to cloud-integration/DELAY_LISTING_ME/S3ACommitBulkDataSuite/bulkdata/output/landsat/parquet/parted-1/year=2016/month=6/part-00000-24152aa2-c86d-49d2-98d4-820dc37a6df1-local-1522235507089.c000.snappy.parquet: java.io.IOException: Max retries exceeded (9) for DynamoDB at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:657) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:636) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:695) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:685) at org.apache.hadoop.fs.s3a.s3guard.S3Guard.putAndReturn(S3Guard.java:149) at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2727) at org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$finalizeMultipartUpload$1(WriteOperationHelper.java:234) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317) at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256) at org.apache.hadoop.fs.s3a.WriteOperationHelper.finalizeMultipartUpload(WriteOperationHelper.java:222) at org.apache.hadoop.fs.s3a.WriteOperationHelper.completeMPUwithRetries(WriteOperationHelper.java:267) at org.apache.hadoop.fs.s3a.commit.CommitOperations.innerCommit(CommitOperations.java:179) at org.apache.hadoop.fs.s3a.commit.CommitOperations.commit(CommitOperations.java:151) at org.apache.hadoop.fs.s3a.commit.CommitOperations.commitOrFail(CommitOperations.java:134) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.lambda$commitPendingUploads$3(AbstractS3ACommitter.java:451) at org.apache.hadoop.fs.s3a.commit.Tasks$Builder$1.run(Tasks.java:254) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            stevel@apache.org Steve Loughran added a comment -

            Attached full log

            1. no meaningful details in the exception, should be "DDB calls not completing", maybe some history
            2. we should compare before & after of results processed. If the result count is decreasing, then its OK to keep retrying, as things have slowed down, not failed.
            3. +review timeout defaults & include details in exception "after 20s"

            This happened during job commit, which is pretty sensitive. The job did complete successfully, because it's wrapped in retry code too. But I think we could have handled it better at the lower levels, as not all apps will be retrying so much.

            +fabbri gabor.bota

            stevel@apache.org Steve Loughran added a comment - Attached full log no meaningful details in the exception, should be "DDB calls not completing", maybe some history we should compare before & after of results processed. If the result count is decreasing, then its OK to keep retrying, as things have slowed down, not failed. +review timeout defaults & include details in exception "after 20s" This happened during job commit, which is pretty sensitive. The job did complete successfully, because it's wrapped in retry code too. But I think we could have handled it better at the lower levels, as not all apps will be retrying so much. + fabbri gabor.bota
            stevel@apache.org Steve Loughran added a comment -

            +metrics. The S3Guard retries are not being added to stats. The bucket had a DDB table with read=write=10 units when it overloaded

            2018-03-28 04:29:34,295 [ScalaTest-main-running-S3ACommitBulkDataSuite] INFO  s3.S3AOperations (Logging.scala:logInfo(54)) - Metrics:
              S3guard_metadatastore_put_path_latency50thPercentileLatency = 0
              S3guard_metadatastore_put_path_latency75thPercentileLatency = 0
              S3guard_metadatastore_put_path_latency90thPercentileLatency = 0
              S3guard_metadatastore_put_path_latency95thPercentileLatency = 0
              S3guard_metadatastore_put_path_latency99thPercentileLatency = 0
              S3guard_metadatastore_put_path_latencyNumOps = 0
              S3guard_metadatastore_throttle_rate50thPercentileFrequency (Hz) = 0
              S3guard_metadatastore_throttle_rate75thPercentileFrequency (Hz) = 0
              S3guard_metadatastore_throttle_rate90thPercentileFrequency (Hz) = 0
              S3guard_metadatastore_throttle_rate95thPercentileFrequency (Hz) = 0
              S3guard_metadatastore_throttle_rate99thPercentileFrequency (Hz) = 0
              S3guard_metadatastore_throttle_rateNumEvents = 0
              committer_bytes_committed = 12594213
              committer_bytes_uploaded = 12594213
              committer_commits_aborted = 0
              committer_commits_completed = 138
              committer_commits_created = 136
              committer_commits_failed = 0
              committer_commits_reverted = 0
              committer_jobs_completed = 29
              committer_jobs_failed = 0
              committer_magic_files_created = 2
              committer_tasks_completed = 34
              committer_tasks_failed = 0
              directories_created = 25
              directories_deleted = 0
              fake_directories_deleted = 1788
              files_copied = 6
              files_copied_bytes = 8473
              files_created = 39
              files_deleted = 28
              ignored_errors = 92
              object_continue_list_requests = 0
              object_copy_requests = 0
              object_delete_requests = 227
              object_list_requests = 431
              object_metadata_requests = 592
              object_multipart_aborted = 0
              object_put_bytes = 12738607
              object_put_bytes_pending = 0
              object_put_requests = 204
              object_put_requests_active = 0
              object_put_requests_completed = 204
              op_copy_from_local_file = 0
              op_create = 39
              op_create_non_recursive = 0
              op_delete = 77
              op_exists = 177
              op_get_file_checksum = 0
              op_get_file_status = 1232
              op_glob_status = 8
              op_is_directory = 22
              op_is_file = 0
              op_list_files = 13
              op_list_located_status = 6
              op_list_status = 110
              op_mkdirs = 10
              op_open = 407
              op_rename = 6
              s3guard_metadatastore_initialization = 1
              s3guard_metadatastore_put_path_request = 203
              s3guard_metadatastore_retry = 0
              s3guard_metadatastore_throttled = 0
              store_io_throttled = 0
              stream_aborted = 0
              stream_backward_seek_operations = 145
              stream_bytes_backwards_on_seek = 8272842
              stream_bytes_discarded_in_abort = 0
              stream_bytes_read = 9199577
              stream_bytes_read_in_close = 783876
              stream_bytes_skipped_on_seek = 187038
              stream_close_operations = 549
              stream_closed = 549
              stream_forward_seek_operations = 72
              stream_opened = 549
              stream_read_exceptions = 0
              stream_read_fully_operations = 448
              stream_read_operations = 13129
              stream_read_operations_incomplete = 932
              stream_seek_operations = 217
              stream_write_block_uploads = 2
              stream_write_block_uploads_aborted = 0
              stream_write_block_uploads_active = 0
              stream_write_block_uploads_committed = 0
              stream_write_block_uploads_data_pending = 0
              stream_write_block_uploads_pending = 37
              stream_write_failures = 0
              stream_write_total_data = 153642
              stream_write_total_time = 738
            

            (ps: suspect that uploads_pending is a false stat & its really the uncommitted uploads being counted)

            stevel@apache.org Steve Loughran added a comment - +metrics. The S3Guard retries are not being added to stats. The bucket had a DDB table with read=write=10 units when it overloaded 2018-03-28 04:29:34,295 [ScalaTest-main-running-S3ACommitBulkDataSuite] INFO s3.S3AOperations (Logging.scala:logInfo(54)) - Metrics: S3guard_metadatastore_put_path_latency50thPercentileLatency = 0 S3guard_metadatastore_put_path_latency75thPercentileLatency = 0 S3guard_metadatastore_put_path_latency90thPercentileLatency = 0 S3guard_metadatastore_put_path_latency95thPercentileLatency = 0 S3guard_metadatastore_put_path_latency99thPercentileLatency = 0 S3guard_metadatastore_put_path_latencyNumOps = 0 S3guard_metadatastore_throttle_rate50thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rate75thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rate90thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rate95thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rate99thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rateNumEvents = 0 committer_bytes_committed = 12594213 committer_bytes_uploaded = 12594213 committer_commits_aborted = 0 committer_commits_completed = 138 committer_commits_created = 136 committer_commits_failed = 0 committer_commits_reverted = 0 committer_jobs_completed = 29 committer_jobs_failed = 0 committer_magic_files_created = 2 committer_tasks_completed = 34 committer_tasks_failed = 0 directories_created = 25 directories_deleted = 0 fake_directories_deleted = 1788 files_copied = 6 files_copied_bytes = 8473 files_created = 39 files_deleted = 28 ignored_errors = 92 object_continue_list_requests = 0 object_copy_requests = 0 object_delete_requests = 227 object_list_requests = 431 object_metadata_requests = 592 object_multipart_aborted = 0 object_put_bytes = 12738607 object_put_bytes_pending = 0 object_put_requests = 204 object_put_requests_active = 0 object_put_requests_completed = 204 op_copy_from_local_file = 0 op_create = 39 op_create_non_recursive = 0 op_delete = 77 op_exists = 177 op_get_file_checksum = 0 op_get_file_status = 1232 op_glob_status = 8 op_is_directory = 22 op_is_file = 0 op_list_files = 13 op_list_located_status = 6 op_list_status = 110 op_mkdirs = 10 op_open = 407 op_rename = 6 s3guard_metadatastore_initialization = 1 s3guard_metadatastore_put_path_request = 203 s3guard_metadatastore_retry = 0 s3guard_metadatastore_throttled = 0 store_io_throttled = 0 stream_aborted = 0 stream_backward_seek_operations = 145 stream_bytes_backwards_on_seek = 8272842 stream_bytes_discarded_in_abort = 0 stream_bytes_read = 9199577 stream_bytes_read_in_close = 783876 stream_bytes_skipped_on_seek = 187038 stream_close_operations = 549 stream_closed = 549 stream_forward_seek_operations = 72 stream_opened = 549 stream_read_exceptions = 0 stream_read_fully_operations = 448 stream_read_operations = 13129 stream_read_operations_incomplete = 932 stream_seek_operations = 217 stream_write_block_uploads = 2 stream_write_block_uploads_aborted = 0 stream_write_block_uploads_active = 0 stream_write_block_uploads_committed = 0 stream_write_block_uploads_data_pending = 0 stream_write_block_uploads_pending = 37 stream_write_failures = 0 stream_write_total_data = 153642 stream_write_total_time = 738 (ps: suspect that uploads_pending is a false stat & its really the uncommitted uploads being counted)
            stevel@apache.org Steve Loughran added a comment -

            Theres ~50 files being committed; each in their own thread from the commit pool; assume the DDB repo is being overloaded just from one single process doing task commit. We should be backing off more, especially given that failing on a write could potentially leave the store inconsistent with the FS (renames, etc).

            stevel@apache.org Steve Loughran added a comment - Theres ~50 files being committed; each in their own thread from the commit pool; assume the DDB repo is being overloaded just from one single process doing task commit. We should be backing off more, especially given that failing on a write could potentially leave the store inconsistent with the FS (renames, etc).
            gabor.bota Gabor Bota added a comment -

            Hi stevel@apache.org,
            I'm not able to find this test you running in the upstream/trunk. Could you send me the test you are running or the test is there somewhere, but I'm not able to find it?

            gabor.bota Gabor Bota added a comment - Hi stevel@apache.org , I'm not able to find this test you running in the upstream/trunk. Could you send me the test you are running or the test is there somewhere, but I'm not able to find it?
            genericqa genericqa added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 17s Docker mode activated.
                  Prechecks
            +1 @author 0m 0s The patch does not contain any @author tags.
            -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
                  trunk Compile Tests
            +1 mvninstall 26m 18s trunk passed
            +1 compile 0m 28s trunk passed
            +1 checkstyle 0m 13s trunk passed
            +1 mvnsite 0m 32s trunk passed
            +1 shadedclient 11m 49s branch has no errors when building and testing our client artifacts.
            +1 findbugs 0m 39s trunk passed
            +1 javadoc 0m 22s trunk passed
                  Patch Compile Tests
            +1 mvninstall 0m 32s the patch passed
            +1 compile 0m 25s the patch passed
            +1 javac 0m 25s the patch passed
            +1 checkstyle 0m 9s the patch passed
            +1 mvnsite 0m 28s the patch passed
            +1 whitespace 0m 0s The patch has no whitespace issues.
            +1 shadedclient 12m 29s patch has no errors when building and testing our client artifacts.
            +1 findbugs 0m 44s the patch passed
            +1 javadoc 0m 20s the patch passed
                  Other Tests
            +1 unit 4m 23s hadoop-aws in the patch passed.
            +1 asflicense 0m 27s The patch does not generate ASF License warnings.
            60m 56s



            Subsystem Report/Notes
            Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd
            JIRA Issue HADOOP-15349
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12931032/HADOOP-15349.001.patch
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
            uname Linux df44e6f5dfb4 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/patchprocess/precommit/personality/provided.sh
            git revision trunk / 7f1d3d0
            maven version: Apache Maven 3.3.9
            Default Java 1.8.0_171
            findbugs v3.1.0-RC1
            Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/testReport/
            Max. process+thread count 303 (vs. ulimit of 10000)
            modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
            Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/console
            Powered by Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org

            This message was automatically generated.

            genericqa genericqa added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       trunk Compile Tests +1 mvninstall 26m 18s trunk passed +1 compile 0m 28s trunk passed +1 checkstyle 0m 13s trunk passed +1 mvnsite 0m 32s trunk passed +1 shadedclient 11m 49s branch has no errors when building and testing our client artifacts. +1 findbugs 0m 39s trunk passed +1 javadoc 0m 22s trunk passed       Patch Compile Tests +1 mvninstall 0m 32s the patch passed +1 compile 0m 25s the patch passed +1 javac 0m 25s the patch passed +1 checkstyle 0m 9s the patch passed +1 mvnsite 0m 28s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 shadedclient 12m 29s patch has no errors when building and testing our client artifacts. +1 findbugs 0m 44s the patch passed +1 javadoc 0m 20s the patch passed       Other Tests +1 unit 4m 23s hadoop-aws in the patch passed. +1 asflicense 0m 27s The patch does not generate ASF License warnings. 60m 56s Subsystem Report/Notes Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd JIRA Issue HADOOP-15349 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12931032/HADOOP-15349.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle uname Linux df44e6f5dfb4 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/patchprocess/precommit/personality/provided.sh git revision trunk / 7f1d3d0 maven version: Apache Maven 3.3.9 Default Java 1.8.0_171 findbugs v3.1.0-RC1 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/testReport/ Max. process+thread count 303 (vs. ulimit of 10000) modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/console Powered by Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
            stevel@apache.org Steve Loughran added a comment -

            Sorry, missed your comment. I'd just set capacity to something like 1 or 2 and had the normal tests fail

            stevel@apache.org Steve Loughran added a comment - Sorry, missed your comment. I'd just set capacity to something like 1 or 2 and had the normal tests fail
            stevel@apache.org Steve Loughran added a comment -

            patch seems good though

            stevel@apache.org Steve Loughran added a comment - patch seems good though
            gabor.bota Gabor Bota added a comment -

            is it enough?

            gabor.bota Gabor Bota added a comment - is it enough?
            mackrorysd Sean Mackrory added a comment -

            +1 to the change. It would be nice if we could confirm that the I/O thresholds really are the reason for unprocessed items. I don't know what else would cause that - JavaDocs don't mention anything. We can retrieve the capacity used in a given attempt from the BatchWriteItemResult, but getting the capacity configured doesn't seem to be exposed in the API (and we can't assume the one configured for new tables in Hadoop is the same as the current table).

            It might be nice to do a bit of testing with some large batch sizes and see if we can at least document a recommended minimum that seems to reliably not exhaust the exponential back-off's buffer time. Can you file a follow-up JIRA for that and I'll commit this patch?

            mackrorysd Sean Mackrory added a comment - +1 to the change. It would be nice if we could confirm that the I/O thresholds really are the reason for unprocessed items. I don't know what else would cause that - JavaDocs don't mention anything. We can retrieve the capacity used in a given attempt from the BatchWriteItemResult, but getting the capacity configured doesn't seem to be exposed in the API (and we can't assume the one configured for new tables in Hadoop is the same as the current table). It might be nice to do a bit of testing with some large batch sizes and see if we can at least document a recommended minimum that seems to reliably not exhaust the exponential back-off's buffer time. Can you file a follow-up JIRA for that and I'll commit this patch?
            gabor.bota Gabor Bota added a comment -

            I've just created HADOOP-15604, so this can be closed.

            gabor.bota Gabor Bota added a comment - I've just created HADOOP-15604 , so this can be closed.
            mackrorysd Sean Mackrory added a comment -

            Thanks, gabor.bota. I committed the change to trunk, so resolving.

            mackrorysd Sean Mackrory added a comment - Thanks, gabor.bota . I committed the change to trunk, so resolving.
            stevel@apache.org Steve Loughran added a comment -

            I'm pleased to say I can now trigger DDB overloads, and the new message is being printed

            [ERROR] testFakeDirectoryDeletion(org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost)  Time elapsed: 32.643 s  <<< ERROR!
            java.io.IOException: Max retries exceeded (5) for DynamoDB. This may be because write threshold of DynamoDB is set too low.
            	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:693)
            	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:672)
            	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$move$4(DynamoDBMetadataStore.java:625)
            	at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:127)
            	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
            	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:125)
            	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:624)
            	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:1072)
            	at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:862)
            	at org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:299)
            	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            	at java.lang.reflect.Method.invoke(Method.java:498)
            	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
            	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
            	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
            	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
            	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
            	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
            	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
            	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
            
            stevel@apache.org Steve Loughran added a comment - I'm pleased to say I can now trigger DDB overloads, and the new message is being printed [ERROR] testFakeDirectoryDeletion(org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost) Time elapsed: 32.643 s <<< ERROR! java.io.IOException: Max retries exceeded (5) for DynamoDB. This may be because write threshold of DynamoDB is set too low. at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:693) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:672) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$move$4(DynamoDBMetadataStore.java:625) at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:127) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:125) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:624) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:1072) at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:862) at org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:299) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
            sunilg Sunil G added a comment -

            I added 3.2.0 and 3.3.0 to Fixed version. Pls add if i missed any versions.

            sunilg Sunil G added a comment - I added 3.2.0 and 3.3.0 to Fixed version. Pls add if i missed any versions.

            People

              gabor.bota Gabor Bota
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: