[HADOOP-15349] S3Guard DDB retryBackoff to be more informative on limits exceeded - ASF JIRA

Steve Loughran added a comment - 28/Mar/18 11:37

Stack

2018-03-28 04:22:17,375 [s3-committer-pool-2] ERROR s3a.S3AFileSystem (S3AFileSystem.java:finishedWrite(2730)) - S3Guard: Error updating MetadataStore for write to cloud-integration/DELAY_LISTING_ME/S3ACommitBulkDataSuite/bulkdata/output/landsat/parquet/parted-1/year=2016/month=6/part-00000-24152aa2-c86d-49d2-98d4-820dc37a6df1-local-1522235507089.c000.snappy.parquet:
java.io.IOException: Max retries exceeded (9) for DynamoDB
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:657)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:636)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:695)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:685)
	at org.apache.hadoop.fs.s3a.s3guard.S3Guard.putAndReturn(S3Guard.java:149)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2727)
	at org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$finalizeMultipartUpload$1(WriteOperationHelper.java:234)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
	at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
	at org.apache.hadoop.fs.s3a.WriteOperationHelper.finalizeMultipartUpload(WriteOperationHelper.java:222)
	at org.apache.hadoop.fs.s3a.WriteOperationHelper.completeMPUwithRetries(WriteOperationHelper.java:267)
	at org.apache.hadoop.fs.s3a.commit.CommitOperations.innerCommit(CommitOperations.java:179)
	at org.apache.hadoop.fs.s3a.commit.CommitOperations.commit(CommitOperations.java:151)
	at org.apache.hadoop.fs.s3a.commit.CommitOperations.commitOrFail(CommitOperations.java:134)
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.lambda$commitPendingUploads$3(AbstractS3ACommitter.java:451)
	at org.apache.hadoop.fs.s3a.commit.Tasks$Builder$1.run(Tasks.java:254)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

Steve Loughran added a comment - 28/Mar/18 11:37 Stack 2018-03-28 04:22:17,375 [s3-committer-pool-2] ERROR s3a.S3AFileSystem (S3AFileSystem.java:finishedWrite(2730)) - S3Guard: Error updating MetadataStore for write to cloud-integration/DELAY_LISTING_ME/S3ACommitBulkDataSuite/bulkdata/output/landsat/parquet/parted-1/year=2016/month=6/part-00000-24152aa2-c86d-49d2-98d4-820dc37a6df1-local-1522235507089.c000.snappy.parquet: java.io.IOException: Max retries exceeded (9) for DynamoDB at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:657) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:636) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:695) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.put(DynamoDBMetadataStore.java:685) at org.apache.hadoop.fs.s3a.s3guard.S3Guard.putAndReturn(S3Guard.java:149) at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2727) at org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$finalizeMultipartUpload$1(WriteOperationHelper.java:234) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317) at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256) at org.apache.hadoop.fs.s3a.WriteOperationHelper.finalizeMultipartUpload(WriteOperationHelper.java:222) at org.apache.hadoop.fs.s3a.WriteOperationHelper.completeMPUwithRetries(WriteOperationHelper.java:267) at org.apache.hadoop.fs.s3a.commit.CommitOperations.innerCommit(CommitOperations.java:179) at org.apache.hadoop.fs.s3a.commit.CommitOperations.commit(CommitOperations.java:151) at org.apache.hadoop.fs.s3a.commit.CommitOperations.commitOrFail(CommitOperations.java:134) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.lambda$commitPendingUploads$3(AbstractS3ACommitter.java:451) at org.apache.hadoop.fs.s3a.commit.Tasks$Builder$1.run(Tasks.java:254) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

Steve Loughran added a comment - 28/Mar/18 11:48

Attached full log

no meaningful details in the exception, should be "DDB calls not completing", maybe some history
we should compare before & after of results processed. If the result count is decreasing, then its OK to keep retrying, as things have slowed down, not failed.
+review timeout defaults & include details in exception "after 20s"

This happened during job commit, which is pretty sensitive. The job did complete successfully, because it's wrapped in retry code too. But I think we could have handled it better at the lower levels, as not all apps will be retrying so much.

+fabbri gabor.bota

Steve Loughran added a comment - 28/Mar/18 11:48 Attached full log no meaningful details in the exception, should be "DDB calls not completing", maybe some history we should compare before & after of results processed. If the result count is decreasing, then its OK to keep retrying, as things have slowed down, not failed. +review timeout defaults & include details in exception "after 20s" This happened during job commit, which is pretty sensitive. The job did complete successfully, because it's wrapped in retry code too. But I think we could have handled it better at the lower levels, as not all apps will be retrying so much. + fabbri gabor.bota

Steve Loughran added a comment - 28/Mar/18 11:52

+metrics. The S3Guard retries are not being added to stats. The bucket had a DDB table with read=write=10 units when it overloaded

2018-03-28 04:29:34,295 [ScalaTest-main-running-S3ACommitBulkDataSuite] INFO  s3.S3AOperations (Logging.scala:logInfo(54)) - Metrics:
  S3guard_metadatastore_put_path_latency50thPercentileLatency = 0
  S3guard_metadatastore_put_path_latency75thPercentileLatency = 0
  S3guard_metadatastore_put_path_latency90thPercentileLatency = 0
  S3guard_metadatastore_put_path_latency95thPercentileLatency = 0
  S3guard_metadatastore_put_path_latency99thPercentileLatency = 0
  S3guard_metadatastore_put_path_latencyNumOps = 0
  S3guard_metadatastore_throttle_rate50thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rate75thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rate90thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rate95thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rate99thPercentileFrequency (Hz) = 0
  S3guard_metadatastore_throttle_rateNumEvents = 0
  committer_bytes_committed = 12594213
  committer_bytes_uploaded = 12594213
  committer_commits_aborted = 0
  committer_commits_completed = 138
  committer_commits_created = 136
  committer_commits_failed = 0
  committer_commits_reverted = 0
  committer_jobs_completed = 29
  committer_jobs_failed = 0
  committer_magic_files_created = 2
  committer_tasks_completed = 34
  committer_tasks_failed = 0
  directories_created = 25
  directories_deleted = 0
  fake_directories_deleted = 1788
  files_copied = 6
  files_copied_bytes = 8473
  files_created = 39
  files_deleted = 28
  ignored_errors = 92
  object_continue_list_requests = 0
  object_copy_requests = 0
  object_delete_requests = 227
  object_list_requests = 431
  object_metadata_requests = 592
  object_multipart_aborted = 0
  object_put_bytes = 12738607
  object_put_bytes_pending = 0
  object_put_requests = 204
  object_put_requests_active = 0
  object_put_requests_completed = 204
  op_copy_from_local_file = 0
  op_create = 39
  op_create_non_recursive = 0
  op_delete = 77
  op_exists = 177
  op_get_file_checksum = 0
  op_get_file_status = 1232
  op_glob_status = 8
  op_is_directory = 22
  op_is_file = 0
  op_list_files = 13
  op_list_located_status = 6
  op_list_status = 110
  op_mkdirs = 10
  op_open = 407
  op_rename = 6
  s3guard_metadatastore_initialization = 1
  s3guard_metadatastore_put_path_request = 203
  s3guard_metadatastore_retry = 0
  s3guard_metadatastore_throttled = 0
  store_io_throttled = 0
  stream_aborted = 0
  stream_backward_seek_operations = 145
  stream_bytes_backwards_on_seek = 8272842
  stream_bytes_discarded_in_abort = 0
  stream_bytes_read = 9199577
  stream_bytes_read_in_close = 783876
  stream_bytes_skipped_on_seek = 187038
  stream_close_operations = 549
  stream_closed = 549
  stream_forward_seek_operations = 72
  stream_opened = 549
  stream_read_exceptions = 0
  stream_read_fully_operations = 448
  stream_read_operations = 13129
  stream_read_operations_incomplete = 932
  stream_seek_operations = 217
  stream_write_block_uploads = 2
  stream_write_block_uploads_aborted = 0
  stream_write_block_uploads_active = 0
  stream_write_block_uploads_committed = 0
  stream_write_block_uploads_data_pending = 0
  stream_write_block_uploads_pending = 37
  stream_write_failures = 0
  stream_write_total_data = 153642
  stream_write_total_time = 738

(ps: suspect that uploads_pending is a false stat & its really the uncommitted uploads being counted)

Steve Loughran added a comment - 28/Mar/18 11:52 +metrics. The S3Guard retries are not being added to stats. The bucket had a DDB table with read=write=10 units when it overloaded 2018-03-28 04:29:34,295 [ScalaTest-main-running-S3ACommitBulkDataSuite] INFO s3.S3AOperations (Logging.scala:logInfo(54)) - Metrics: S3guard_metadatastore_put_path_latency50thPercentileLatency = 0 S3guard_metadatastore_put_path_latency75thPercentileLatency = 0 S3guard_metadatastore_put_path_latency90thPercentileLatency = 0 S3guard_metadatastore_put_path_latency95thPercentileLatency = 0 S3guard_metadatastore_put_path_latency99thPercentileLatency = 0 S3guard_metadatastore_put_path_latencyNumOps = 0 S3guard_metadatastore_throttle_rate50thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rate75thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rate90thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rate95thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rate99thPercentileFrequency (Hz) = 0 S3guard_metadatastore_throttle_rateNumEvents = 0 committer_bytes_committed = 12594213 committer_bytes_uploaded = 12594213 committer_commits_aborted = 0 committer_commits_completed = 138 committer_commits_created = 136 committer_commits_failed = 0 committer_commits_reverted = 0 committer_jobs_completed = 29 committer_jobs_failed = 0 committer_magic_files_created = 2 committer_tasks_completed = 34 committer_tasks_failed = 0 directories_created = 25 directories_deleted = 0 fake_directories_deleted = 1788 files_copied = 6 files_copied_bytes = 8473 files_created = 39 files_deleted = 28 ignored_errors = 92 object_continue_list_requests = 0 object_copy_requests = 0 object_delete_requests = 227 object_list_requests = 431 object_metadata_requests = 592 object_multipart_aborted = 0 object_put_bytes = 12738607 object_put_bytes_pending = 0 object_put_requests = 204 object_put_requests_active = 0 object_put_requests_completed = 204 op_copy_from_local_file = 0 op_create = 39 op_create_non_recursive = 0 op_delete = 77 op_exists = 177 op_get_file_checksum = 0 op_get_file_status = 1232 op_glob_status = 8 op_is_directory = 22 op_is_file = 0 op_list_files = 13 op_list_located_status = 6 op_list_status = 110 op_mkdirs = 10 op_open = 407 op_rename = 6 s3guard_metadatastore_initialization = 1 s3guard_metadatastore_put_path_request = 203 s3guard_metadatastore_retry = 0 s3guard_metadatastore_throttled = 0 store_io_throttled = 0 stream_aborted = 0 stream_backward_seek_operations = 145 stream_bytes_backwards_on_seek = 8272842 stream_bytes_discarded_in_abort = 0 stream_bytes_read = 9199577 stream_bytes_read_in_close = 783876 stream_bytes_skipped_on_seek = 187038 stream_close_operations = 549 stream_closed = 549 stream_forward_seek_operations = 72 stream_opened = 549 stream_read_exceptions = 0 stream_read_fully_operations = 448 stream_read_operations = 13129 stream_read_operations_incomplete = 932 stream_seek_operations = 217 stream_write_block_uploads = 2 stream_write_block_uploads_aborted = 0 stream_write_block_uploads_active = 0 stream_write_block_uploads_committed = 0 stream_write_block_uploads_data_pending = 0 stream_write_block_uploads_pending = 37 stream_write_failures = 0 stream_write_total_data = 153642 stream_write_total_time = 738 (ps: suspect that uploads_pending is a false stat & its really the uncommitted uploads being counted)

Steve Loughran added a comment - 28/Mar/18 11:57

Theres ~50 files being committed; each in their own thread from the commit pool; assume the DDB repo is being overloaded just from one single process doing task commit. We should be backing off more, especially given that failing on a write could potentially leave the store inconsistent with the FS (renames, etc).

Steve Loughran added a comment - 28/Mar/18 11:57 Theres ~50 files being committed; each in their own thread from the commit pool; assume the DDB repo is being overloaded just from one single process doing task commit. We should be backing off more, especially given that failing on a write could potentially leave the store inconsistent with the FS (renames, etc).

Gabor Bota added a comment - 04/Jul/18 14:17

Hi stevel@apache.org,
I'm not able to find this test you running in the upstream/trunk. Could you send me the test you are running or the test is there somewhere, but I'm not able to find it?

Gabor Bota added a comment - 04/Jul/18 14:17 Hi stevel@apache.org , I'm not able to find this test you running in the upstream/trunk. Could you send me the test you are running or the test is there somewhere, but I'm not able to find it?

genericqa added a comment - 11/Jul/18 08:23

-1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	0m 17s	Docker mode activated.
			Prechecks
+1	@author	0m 0s	The patch does not contain any @author tags.
-1	test4tests	0m 0s	The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			trunk Compile Tests
+1	mvninstall	26m 18s	trunk passed
+1	compile	0m 28s	trunk passed
+1	checkstyle	0m 13s	trunk passed
+1	mvnsite	0m 32s	trunk passed
+1	shadedclient	11m 49s	branch has no errors when building and testing our client artifacts.
+1	findbugs	0m 39s	trunk passed
+1	javadoc	0m 22s	trunk passed
			Patch Compile Tests
+1	mvninstall	0m 32s	the patch passed
+1	compile	0m 25s	the patch passed
+1	javac	0m 25s	the patch passed
+1	checkstyle	0m 9s	the patch passed
+1	mvnsite	0m 28s	the patch passed
+1	whitespace	0m 0s	The patch has no whitespace issues.
+1	shadedclient	12m 29s	patch has no errors when building and testing our client artifacts.
+1	findbugs	0m 44s	the patch passed
+1	javadoc	0m 20s	the patch passed
			Other Tests
+1	unit	4m 23s	hadoop-aws in the patch passed.
+1	asflicense	0m 27s	The patch does not generate ASF License warnings.
		60m 56s

Subsystem	Report/Notes
Docker	Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd
JIRA Issue	~~HADOOP-15349~~
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12931032/HADOOP-15349.001.patch
Optional Tests	asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux df44e6f5dfb4 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/testptch/patchprocess/precommit/personality/provided.sh
git revision	trunk / 7f1d3d0
maven	version: Apache Maven 3.3.9
Default Java	1.8.0_171
findbugs	v3.1.0-RC1
Test Results	https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/testReport/
Max. process+thread count	303 (vs. ulimit of 10000)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/console
Powered by	Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org

This message was automatically generated.

genericqa added a comment - 11/Jul/18 08:23 -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. Prechecks +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. trunk Compile Tests +1 mvninstall 26m 18s trunk passed +1 compile 0m 28s trunk passed +1 checkstyle 0m 13s trunk passed +1 mvnsite 0m 32s trunk passed +1 shadedclient 11m 49s branch has no errors when building and testing our client artifacts. +1 findbugs 0m 39s trunk passed +1 javadoc 0m 22s trunk passed Patch Compile Tests +1 mvninstall 0m 32s the patch passed +1 compile 0m 25s the patch passed +1 javac 0m 25s the patch passed +1 checkstyle 0m 9s the patch passed +1 mvnsite 0m 28s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 shadedclient 12m 29s patch has no errors when building and testing our client artifacts. +1 findbugs 0m 44s the patch passed +1 javadoc 0m 20s the patch passed Other Tests +1 unit 4m 23s hadoop-aws in the patch passed. +1 asflicense 0m 27s The patch does not generate ASF License warnings. 60m 56s Subsystem Report/Notes Docker Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd JIRA Issue HADOOP-15349 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12931032/HADOOP-15349.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle uname Linux df44e6f5dfb4 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/patchprocess/precommit/personality/provided.sh git revision trunk / 7f1d3d0 maven version: Apache Maven 3.3.9 Default Java 1.8.0_171 findbugs v3.1.0-RC1 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/testReport/ Max. process+thread count 303 (vs. ulimit of 10000) modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/14873/console Powered by Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

Steve Loughran added a comment - 11/Jul/18 14:06

Sorry, missed your comment. I'd just set capacity to something like 1 or 2 and had the normal tests fail

Steve Loughran added a comment - 11/Jul/18 14:06 Sorry, missed your comment. I'd just set capacity to something like 1 or 2 and had the normal tests fail

Steve Loughran added a comment - 11/Jul/18 14:07

patch seems good though

Steve Loughran added a comment - 11/Jul/18 14:07 patch seems good though

Gabor Bota added a comment - 11/Jul/18 14:08

is it enough?

Gabor Bota added a comment - 11/Jul/18 14:08 is it enough?

Sean Mackrory added a comment - 12/Jul/18 13:22

+1 to the change. It would be nice if we could confirm that the I/O thresholds really are the reason for unprocessed items. I don't know what else would cause that - JavaDocs don't mention anything. We can retrieve the capacity used in a given attempt from the BatchWriteItemResult, but getting the capacity configured doesn't seem to be exposed in the API (and we can't assume the one configured for new tables in Hadoop is the same as the current table).

It might be nice to do a bit of testing with some large batch sizes and see if we can at least document a recommended minimum that seems to reliably not exhaust the exponential back-off's buffer time. Can you file a follow-up JIRA for that and I'll commit this patch?

Sean Mackrory added a comment - 12/Jul/18 13:22 +1 to the change. It would be nice if we could confirm that the I/O thresholds really are the reason for unprocessed items. I don't know what else would cause that - JavaDocs don't mention anything. We can retrieve the capacity used in a given attempt from the BatchWriteItemResult, but getting the capacity configured doesn't seem to be exposed in the API (and we can't assume the one configured for new tables in Hadoop is the same as the current table). It might be nice to do a bit of testing with some large batch sizes and see if we can at least document a recommended minimum that seems to reliably not exhaust the exponential back-off's buffer time. Can you file a follow-up JIRA for that and I'll commit this patch?

Gabor Bota added a comment - 13/Jul/18 09:13

I've just created ~~HADOOP-15604~~, so this can be closed.

Gabor Bota added a comment - 13/Jul/18 09:13 I've just created HADOOP-15604 , so this can be closed.

Sean Mackrory added a comment - 13/Jul/18 12:03

Thanks, gabor.bota. I committed the change to trunk, so resolving.

Sean Mackrory added a comment - 13/Jul/18 12:03 Thanks, gabor.bota . I committed the change to trunk, so resolving.

Steve Loughran added a comment - 25/Jul/18 00:08

I'm pleased to say I can now trigger DDB overloads, and the new message is being printed

[ERROR] testFakeDirectoryDeletion(org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost)  Time elapsed: 32.643 s  <<< ERROR!
java.io.IOException: Max retries exceeded (5) for DynamoDB. This may be because write threshold of DynamoDB is set too low.
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:693)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:672)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$move$4(DynamoDBMetadataStore.java:625)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:127)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:125)
	at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:624)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:1072)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:862)
	at org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:299)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

Steve Loughran added a comment - 25/Jul/18 00:08 I'm pleased to say I can now trigger DDB overloads, and the new message is being printed [ERROR] testFakeDirectoryDeletion(org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost) Time elapsed: 32.643 s <<< ERROR! java.io.IOException: Max retries exceeded (5) for DynamoDB. This may be because write threshold of DynamoDB is set too low. at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.retryBackoff(DynamoDBMetadataStore.java:693) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.processBatchWriteRequest(DynamoDBMetadataStore.java:672) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$move$4(DynamoDBMetadataStore.java:625) at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:127) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:125) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:624) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:1072) at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:862) at org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:299) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

Sunil G added a comment - 06/Nov/18 10:00

I added 3.2.0 and 3.3.0 to Fixed version. Pls add if i missed any versions.

Sunil G added a comment - 06/Nov/18 10:00 I added 3.2.0 and 3.3.0 to Fixed version. Pls add if i missed any versions.

Hadoop Common

S3Guard DDB retryBackoff to be more informative on limits exceeded

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates