Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-19353 Über-jira: S3A Hadoop 3.4.2 features
  3. HADOOP-17961

s3 and abfs incremental listing: use SAX parsers to stream results to list iterators

Details

    • Sub-task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.3.2
    • None
    • fs/azure, fs/s3
    • None

    Description

      With code gradually adopting listStatusIncremental(), asking for a smaller initial batch could permit faster ramp of result processing.

      probably most significant on an s3 versioned bucket, as there the need to skip tombstones can result in significantly slower listings -but could benefit ABFS too

      Attachments

        Activity

          stevel@apache.org Steve Loughran added a comment -

          actually, the really slick way to do this is parse the XML responses through a SAX parser and so get the results incrementally, which we would be streamed direct to the iterators.

          this would ensure the fewest #of requests (cost and efficiency), but deliver the lowest latencies possible between request issued and the first listing entries coming back

          stevel@apache.org Steve Loughran added a comment - actually, the really slick way to do this is parse the XML responses through a SAX parser and so get the results incrementally, which we would be streamed direct to the iterators. this would ensure the fewest #of requests (cost and efficiency), but deliver the lowest latencies possible between request issued and the first listing entries coming back
          stevel@apache.org Steve Loughran added a comment -

          note: big assumption here, that the services stream the output, rather than build up a document and then serve it. timeout behaviours on s3 with versioned buckets and the v1 API hint that this holds there

          stevel@apache.org Steve Loughran added a comment - note: big assumption here, that the services stream the output, rather than build up a document and then serve it. timeout behaviours on s3 with versioned buckets and the v1 API hint that this holds there

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: