[HADOOP-13695] S3A to use a thread pool for async path operations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Done
Affects Version/s: 2.8.0
Fix Version/s: 3.3.5
Component/s: fs/s3
Labels:
None

Target Version/s:

3.4.0

Description

S3A path operations are often slow due to directory scanning, mock directory create/delete, etc. Many of these can be done asynchronously

because deletion is eventually consistent, deleting parent dirs after an operation has returned doesn't alter the behaviour, except in the special case of : operation failure.
scanning for paths/parents of a file in the create operation only needs to complete before the close() operation instantiates the object, no need to block create().
parallelized COPY calls would permit asynchronous rename.

We could either use the thread pool used for block writes, or somehow isolate low cost path ops (GET, DELETE) from the more expensive calls (COPY, PUT) so that a thread doing basic IO doesn't block for the duration of the long op. Maybe also use Semaphore.tryAcquire() and only start async work if there actually is an idle thread, doing it synchronously if not. Maybe it depends on the operation. path query/cleanup before/after a write is something which could be scheduled as just more futures to schedule in the block write.

Attachments

Issue Links

is depended upon by

HADOOP-13222 s3a.mkdirs() to delete empty fake parent directories

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 07/Oct/16 15:10

Updated:: 04/Oct/22 16:57

Resolved:: 04/Oct/22 16:56