[HADOOP-13761] S3Guard: implement retries for DDB failures and throttling; translate exceptions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 3.0.0-beta1
Fix Version/s: 3.1.0
Component/s: fs/s3
Labels:
None

Target Version/s:

3.1.0

Description

Following the S3AFileSystem integration patch in ~~HADOOP-13651~~, we need to add retry logic.

In ~~HADOOP-13651~~, I added TODO comments in most of the places retry loops are needed, including:

open(path). If MetadataStore reflects recent create/move of file path, but we fail to read it from S3, retry.
delete(path). If deleteObject() on S3 fails, but MetadataStore shows the file exists, retry.
rename(src,dest). If source path is not visible in S3 yet, retry.
listFiles(). Skip for now. Not currently implemented in S3Guard. I will create a separate JIRA for this as it will likely require interface changes (i.e. prefix or subtree scan).

We may miss some cases initially and we should do failure injection testing to make sure we're covered. Failure injection tests can be a separate JIRA to make this easier to review.

We also need basic configuration parameters around retry policy. There should be a way to specify maximum retry duration, as some applications would prefer to receive an error eventually, than waiting indefinitely. We should also be keeping statistics when inconsistency is detected and we enter a retry loop.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-13761.001.patch
25/Jan/18 21:44
11 kB
Aaron Fabbri
HADOOP-13761.002.patch
26/Jan/18 23:50
11 kB
Aaron Fabbri
HADOOP-13761.003.patch
15/Feb/18 01:36
60 kB
Aaron Fabbri
HADOOP-13761.004.patch
15/Feb/18 02:13
62 kB
Aaron Fabbri
HADOOP-13761-004-to-005.patch
15/Feb/18 19:14
11 kB
Steve Loughran
HADOOP-13761-005.patch
15/Feb/18 19:15
62 kB
Steve Loughran
HADOOP-13761-005-to-006-approx.diff.txt
16/Feb/18 06:31
17 kB
Aaron Fabbri
HADOOP-13761-006.patch
16/Feb/18 06:32
67 kB
Aaron Fabbri
HADOOP-13761-007.patch
22/Feb/18 04:25
74 kB
Aaron Fabbri
HADOOP-13761-008.patch
22/Feb/18 21:05
74 kB
Aaron Fabbri
HADOOP-13761-009.patch
22/Feb/18 23:38
75 kB
Aaron Fabbri
HADOOP-13761-010.patch
23/Feb/18 17:54
75 kB
Steve Loughran
HADOOP-13761-010.patch
23/Feb/18 17:54
75 kB
Steve Loughran
HADOOP-13761-011.patch
26/Feb/18 11:23
75 kB
Steve Loughran
HADOOP-13761-012.patch
02/Mar/18 04:45
76 kB
Aaron Fabbri
HADOOP-13761-013.patch
05/Mar/18 13:49
76 kB
Steve Loughran
HADOOP-15183-013.patch
05/Mar/18 12:42
76 kB
Steve Loughran

Issue Links

blocks

HADOOP-14576 s3guard DynamoDB resource not found: tables not ACTIVE state after initial connection

Resolved

contains

HADOOP-15216 S3AInputStream to handle reconnect on read() failure better

Resolved

depends upon

HADOOP-13651 S3Guard: S3AFileSystem Integration with MetadataStore

Resolved

HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints

Resolved

incorporates

HADOOP-15035 S3Guard to perform retry and translation of exceptions

Resolved

is duplicated by

HADOOP-14012 Handled dynamo exceptions in translateException

Resolved

HADOOP-14810 S3Guard: handle provisioning failure through backoff & retry (& metrics)

Resolved

is related to

HADOOP-15426 Make S3guard client resilient to DDB throttle events and network failures

Resolved

(2 is duplicated by, 1 is related to)

Activity

People

Assignee:: Aaron Fabbri

Reporter:: Aaron Fabbri

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 26/Oct/16 03:47

Updated:: 31/Jul/18 22:14

Resolved:: 05/Mar/18 14:50