[HADOOP-16221] S3Guard: fail write that doesn't update metadata store - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.3.0
Component/s: fs/s3
Labels:
None

Description

Right now, a failure to write to the S3Guard metadata store (e.g. DynamoDB) is merely logged. It does not fail the S3AFileSystem write operation itself. As such, the writer has no idea that anything went wrong. The implication of this is that S3Guard doesn't always provide the consistency it advertises.

For example this article states:

If a Hadoop S3A client creates or moves a file, and then a client lists its directory, that file is now guaranteed to be included in the listing.

Unfortunately, this is sort of untrue and could result in exactly the sort of problem S3Guard is supposed to avoid:

Missing data that is silently dropped. Multi-step Hadoop jobs that depend on output of previous jobs may silently omit some data. This omission happens when a job chooses which files to consume based on a directory listing, which may not include recently-written items.

Imagine the typical multi-job Hadoop processing pipeline. Job 1 runs and succeeds, but one (or more) S3Guard metadata write failed under the covers. Job 2 picks up the output directory from Job 1 and runs its processing, potentially seeing an inconsistent listing, silently missing some of the Job 1 output files.

S3Guard should at least provide a configuration option to fail if the metadata write fails. It seems even ideally this should be the default?

Attachments

Issue Links

causes

HADOOP-16375 ITestS3AMetadataPersistenceException failure

Resolved

is related to

HADOOP-16330 Regression: TestStagingPartitionedJobCommit failing with empty etag list

Resolved

links to

GitHub Pull Request #666

Activity

People

Assignee:: Ben Roling

Reporter:: Ben Roling

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 29/Mar/19 13:54

Updated:: 14/Jun/19 09:10

Resolved:: 30/Apr/19 10:55