[HADOOP-17961] s3 and abfs incremental listing: use SAX parsers to stream results to list iterators - ASF JIRA

Details

Type: Sub-task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.3.2
Fix Version/s: None
Component/s: fs/azure, fs/s3
Labels:
None

Description

With code gradually adopting listStatusIncremental(), asking for a smaller initial batch could permit faster ramp of result processing.

probably most significant on an s3 versioned bucket, as there the need to skip tombstones can result in significantly slower listings -but could benefit ABFS too

Attachments

Activity

Ascending order - Click to sort in descending order

Steve Loughran added a comment - 05/Sep/22 13:13

actually, the really slick way to do this is parse the XML responses through a SAX parser and so get the results incrementally, which we would be streamed direct to the iterators.

this would ensure the fewest #of requests (cost and efficiency), but deliver the lowest latencies possible between request issued and the first listing entries coming back

Steve Loughran added a comment - 05/Sep/22 13:13 actually, the really slick way to do this is parse the XML responses through a SAX parser and so get the results incrementally, which we would be streamed direct to the iterators. this would ensure the fewest #of requests (cost and efficiency), but deliver the lowest latencies possible between request issued and the first listing entries coming back

Steve Loughran added a comment - 05/Sep/22 13:15

note: big assumption here, that the services stream the output, rather than build up a document and then serve it. timeout behaviours on s3 with versioned buckets and the v1 API hint that this holds there

Steve Loughran added a comment - 05/Sep/22 13:15 note: big assumption here, that the services stream the output, rather than build up a document and then serve it. timeout behaviours on s3 with versioned buckets and the v1 API hint that this holds there

People

Assignee:: Unassigned

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Oct/21 15:27

Updated:: 03/Dec/24 13:39

Hadoop Common