[HADOOP-15621] S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0-beta1
Fix Version/s: 3.3.0
Component/s: fs/s3
Labels:
None

Description

Similar to ~~HADOOP-13649~~, I think we should add a TTL (time to live) feature to the Dynamo metadata store (MS) for S3Guard.

This is a similar concept to an "online algorithm" version of the CLI prune() function, which is the "offline algorithm".

Why:
1. Self healing (soft state): since we do not implement transactions around modification of the two systems (s3 and metadata store), certain failures can lead to inconsistency between S3 and the metadata store (MS) state. Having a time to live (TTL) on each entry in S3Guard means that any inconsistencies will be time bound. Thus "wait and restart your job" becomes a valid, if ugly, way to get around any issues with FS client failure leaving things in a bad state.
2. We could make manual invocation of `hadoop s3guard prune ...` unnecessary, depending on the implementation.
3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune directories due to the lack of true modification time.

How:
I think we need a new column in the dynamo table "entry last written time". This is updated each time the entry is written to dynamo.
After that we can either
1. Have the client simply ignore / elide any entries that are older than the configured TTL.
2. Have the client delete entries older than the TTL.

The issue with #2 is it will increase latency if done inline in the context of an FS operation. We could mitigate this some by using an async helper thread, or probabilistically doing it "some times" to amortize the expense of deleting stale entries (allowing some batching as well).

Caveats:

Clock synchronization as usual is a concern. Many clusters already keep clocks close enough via NTP. We should at least document the requirement along with the configuration knob that enables the feature.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-15621.001.patch
02/Sep/18 09:02
30 kB
Gabor Bota
HADOOP-15621.002.patch
27/Sep/18 23:50
37 kB
Gabor Bota

Issue Links

causes

HADOOP-15827 NPE in DynamoDBMetadataStore.lambda$listChildren for root + auth S3Guard

Resolved

is a clone of

HADOOP-13649 s3guard: implement time-based (TTL) expiry for LocalMetadataStore

Resolved

is depended upon by

HADOOP-15779 S3guard: add inconsistency detection metrics

Resolved

relates to

HADOOP-13936 S3Guard: DynamoDB can go out of sync with S3AFileSystem.delete()

Resolved

HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

Resolved

Activity

People

Assignee:: Gabor Bota

Reporter:: Aaron Fabbri

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Jul/18 22:22

Updated:: 26/Feb/20 05:29

Resolved:: 03/Oct/18 04:24