[HADOOP-16456] Refactor the S3A codebase into a more maintainable and testable form - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: fs/s3
Labels:
None

Description

The S3A Codebase has got too complex to be maintained. In particular,

the lack of layering in the S3AFileSystem class means that all subcomponents (delegation, dynamo db, block outputstream etc) all get given a back reference and make arbitrary calls in to it.
We can't test in isolation, and while integration tests are the most rigorous testing we can have, they are slow, hard to inject failures into and do not work on isolated parts of code
The code within the S3A FileSystem calls the toplevel API calls internally, so mixing public interface with the implementation details
We are adding context through S3Guard calls for: consistency, performance and recovery; we can't do that without a clean split between that public API and the internals

Proposed:

we carefully break up the S3AFileSystem into a layered design
with a "StoreContext" to bind components of the connector to it
and some form of operation context to be passed in with each request to represent the active operation and its state (including that for S3Guard BulkOperations)

See refactoring S3A

I've already started using some of this design in the ~~HADOOP-15183~~ component, for the addition of those S3Guard bulk operations, and to add a medium-life "RenameOperation". The proposal document reviews that experience and discusses improvements.

As noted: this needs to be done with care. We still need to maintain the existing codebase; the more radically we change the code not only do we increase the risk of the changes being wrong, we make backporting that much harder. But we can't sustain the current design

Attachments

Issue Links

depends upon

HADOOP-16134 Add S3AWriteOpContext for write ops; pass in statistics and other settings

Open

Sub-Tasks

add initial S3A layering + async init

Open

Steve Loughran

100%

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 24/Jul/19 14:53

Updated:: 17/Jan/20 12:23

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1.5h

Include sub-tasks