Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11694 Über-jira: S3a phase II: robustness, scale and performance
  3. HADOOP-13203

S3A: Support fadvise "random" mode for high performance readPositioned() reads

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 2.8.0, 3.0.0-alpha1
    • fs/s3
    • None
    • Reviewed
    • Hide
      S3A has added support for configurable input policies. Similar to fadvise, this configuration provides applications with a way to specify their expected access pattern (sequential or random) while reading a file. S3A then performs optimizations tailored to that access pattern. See site documentation of the fs.s3a.experimental.input.fadvise configuration property for more details. Please be advised that this feature is experimental and subject to backward-incompatible changes in future releases.
      Show
      S3A has added support for configurable input policies. Similar to fadvise, this configuration provides applications with a way to specify their expected access pattern (sequential or random) while reading a file. S3A then performs optimizations tailored to that access pattern. See site documentation of the fs.s3a.experimental.input.fadvise configuration property for more details. Please be advised that this feature is experimental and subject to backward-incompatible changes in future releases.

    Description

      Currently file's "contentLength" is set as the "requestedStreamLen", when invoking S3AInputStream::reopen(). As a part of lazySeek(), sometimes the stream had to be closed and reopened. But lots of times the stream was closed with abort() causing the internal http connection to be unusable. This incurs lots of connection establishment cost in some jobs. It would be good to set the correct value for the stream length to avoid connection aborts.

      I will post the patch once aws tests passes in my machine.

      Attachments

        1. HADOOP-13203-branch-2-001.patch
          3 kB
          Rajesh Balamohan
        2. HADOOP-13203-branch-2-002.patch
          6 kB
          Rajesh Balamohan
        3. HADOOP-13203-branch-2-003.patch
          6 kB
          Rajesh Balamohan
        4. HADOOP-13203-branch-2-004.patch
          6 kB
          Rajesh Balamohan
        5. HADOOP-13203-branch-2-005.patch
          28 kB
          Steve Loughran
        6. HADOOP-13203-branch-2-006.patch
          43 kB
          Steve Loughran
        7. HADOOP-13203-branch-2-007.patch
          42 kB
          Steve Loughran
        8. HADOOP-13203-branch-2-008.patch
          53 kB
          Steve Loughran
        9. HADOOP-13203-branch-2-009.patch
          57 kB
          Steve Loughran
        10. HADOOP-13203-branch-2-010.patch
          57 kB
          Steve Loughran
        11. stream_stats.tar.gz
          716 kB
          Rajesh Balamohan

        Issue Links

          Activity

            People

              rajesh.balamohan Rajesh Balamohan
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: