[HADOOP-17654] abfs incremental listing to support many active listings - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.1
Fix Version/s: None
Component/s: fs/azure
Labels:
None

Description

Each incremental iterator submits an async fetcher operation into the JVM's common ForkJoin thread pool, which defaults to # of cores -1., unless set iin "java.util.concurrent.ForkJoinPool.common.parallelism";

Given the LIST calls are going to be blocking, this may puts a limit on the performance of listing if you have many threads executing list requests, e.g spark workers.

Reviewing the code, the maximum number of list operations which can collect results will be limited to the #of cores -the others are going to block until the lists have been processed.

Which may also means: if you have multiple incremental iterators in the same thread (e.g. treewalking) there's a risk that you could actually deadlock.

I'm not convinced this will happen, as once each listing has reached the end of its directory or there are 10 pages in the result queue, the submitted operation will complete.

But: we need a test for this. Is there any public abfs store with many, many objects we could use as a source for listings, similar to the AWS landsat repo we (ab)use for such purposes in the s3a ITests?

Attachments

Issue Links

is depended upon by

HADOOP-17512 Remove the enable/disable flag for ABFSRemoteListIterator

Open

Activity

People

Assignee:: Unassigned

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 22/Apr/21 13:23

Updated:: 22/Apr/21 13:24