[HADOOP-16430] S3AFilesystem.delete to incrementally update s3guard with deletions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0, 3.3.0
Fix Version/s: 3.3.0
Component/s: fs/s3
Labels:
None

Target Version/s:

3.3.0

Description

Currently S3AFilesystem.delete() only updates the delete at the end of a paged delete operation. This makes it slow when there are many thousands of files to delete ,and increases the window of vulnerability to failures

Preferred

after every bulk DELETE call is issued to S3, queue the (async) delete of all entries in that post.
at the end of the delete, await the completion of these operations.
inside S3AFS, also do the delete across threads, so that different HTTPS connections can be used.

This should maximise DDB throughput against tables which aren't IO limited.

When executed against small IOP limited tables, the parallel DDB DELETE batches will trigger a lot of throttling events; we should make sure these aren't going to trigger failures

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screenshot 2019-07-16 at 22.08.31.png
16/Jul/19 21:09
58 kB
Steve Loughran

Issue Links

breaks

HADOOP-16554 mvn javadoc:javadoc fails in hadoop-aws

Resolved

contains

HADOOP-13330 Parallelize S3A directory deletes

Resolved

HADOOP-16489 S3Guard operations log has tombstone/PUT swapped

Resolved

is related to

HADOOP-17244 HADOOP-17244. S3A directory delete tombstones dir markers prematurely.

Resolved

relates to

HADOOP-15183 S3Guard store becomes inconsistent after partial failure of rename

Resolved

links to

GitHub Pull Request #1359

(1 links to)

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 12/Jul/19 14:01

Updated:: 08/Sep/20 12:00

Resolved:: 10/Sep/19 04:38