[HADOOP-14303] Review retry logic on all S3 SDK calls, implement where needed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.8.0
Fix Version/s: None
Component/s: fs/s3
Labels:
None

Description

AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle this without failing, as if it slows down its requests it can recover.

1. Look at all the places where we are calling S3A via the AWS SDK and make sure we are retrying with some backoff & jitter policy, ideally something unified. This must be more systematic than the case-by-case, problem-by-problem strategy we are implicitly using.
2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), but we need to check the other parts of the process: login, initiate/complete MPU, ...

~~HADOOP-13811~~ Failed to sanitize XML document destined for handler class
~~HADOOP-13664~~ S3AInputStream to use a retry policy on read failures

This stuff is all hard to test. A key need is to be able to differentiate recoverable throttle & network failures from unrecoverable problems like: auth, network config (e.g bad endpoint), etc.

May be the opportunity to add a faulting subclass of Amazon S3 client which can be configured in IT Tests to fail at specific points. Ryan Blue's mcok S3 client does this in ~~HADOOP-13786~~, but it is for 100% mock. I'm thinking of something with similar fault raising, but in front of the real S3A client

Attachments

Issue Links

incorporates

HADOOP-13059 S3a over-reacts to potentially transient network problems in its init() logic

Resolved

is duplicated by

HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints

Resolved

is part of

HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints

Resolved

relates to

HADOOP-11572 s3a delete() operation fails during a concurrent delete of child entries

Resolved

HADOOP-13205 S3A to support custom retry policies; failfast on unknown host

Resolved

HADOOP-13811 s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

Resolved

HADOOP-14381 S3AUtils.translateException to map 503 reponse to => throttling failure

Resolved

HADOOP-13664 S3AInputStream to use a retry policy on read failures

Resolved

HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints

Resolved

(4 relates to)

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 13/Apr/17 09:20

Updated:: 22/Mar/18 04:44

Resolved:: 22/Mar/18 04:44