Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11694

Über-jira: S3a phase II: robustness, scale and performance

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.8.0
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      HADOOP-11571 covered the core s3a bugs surfacing in Hadoop-2.6 & other enhancements to improve S3 (performance, proxy, custom endpoints)

      This JIRA covers post-2.7 issues and enhancements.

        Attachments

          Issue Links

          1.
          Enable YARN to use S3A Sub-task Resolved Pieter Reuse  
          2.
          Add NativeS3Fs that delegates calls from FileContext apis to native s3 fs implementation Sub-task Resolved Sumit Kumar  
          3.
          S3a to use thread pool that blocks clients Sub-task Resolved Thomas Demoor  
          4.
          verify that s3a handles / in secret key Sub-task Resolved Steve Loughran  
          5.
          s3a can throw spurious IOEs on close() Sub-task Resolved Steve Loughran  
          6.
          ListStatus on empty dir in S3A lists itself instead of returning an empty list Sub-task Resolved Pieter Reuse  
          7.
          Make use of DeleteObjects optional Sub-task Resolved Thomas Demoor  
          8.
          Listing an empty s3a root directory throws FileNotFound. Sub-task Resolved Lei (Eddy) Xu  
          9.
          Support lazy seek in S3AInputStream Sub-task Resolved Rajesh Balamohan  
          10.
          Update documentation to cover fs.s3.buffer.dir enhancements Sub-task Resolved Unassigned  
          11.
          S3A JUnit tests failing if using HTTP proxy Sub-task Resolved Zoran Rajic

          0%

          Original Estimate - 24h
          Remaining Estimate - 24h
          12.
          s3a should use UGI.getCurrentUser.getShortname() for username Sub-task Resolved Steve Loughran  
          13.
          Recover when S3A fails on IOException in read() Sub-task Resolved Pieter Reuse  
          14.
          s3a toString to be meaningful in logs Sub-task Resolved Steve Loughran  
          15.
          s3a to handle delete("/", true) robustly Sub-task Resolved Steve Loughran  
          16.
          move s3a to slf4j logging Sub-task Resolved Unassigned  
          17.
          s3a to pass PositionedReadable contract tests, improve readFully perf. Sub-task Resolved Steve Loughran  
          18.
          add option for lazy open() on s3a Sub-task Resolved Unassigned  
          19.
          add low level counter metrics for S3A; use in read performance tests Sub-task Resolved Steve Loughran  
          20.
          S3a Forward seek in stream length to be configurable Sub-task Resolved Steve Loughran  
          21.
          S3A FS fails during init against a read-only FS if multipart purge is enabled Sub-task Resolved Steve Loughran  
          22.
          switch hadoop-aws back to using the (heavy) amazon-sdk JAR Sub-task Resolved Steve Loughran  
          23.
          if fs.s3a.block.size option == 0, use partition size option for blocksize Sub-task Resolved Steve Loughran  
          24.
          S3A Introspect to invoke incompatible AWS TransferManagerConfiguration methods Sub-task Resolved Unassigned  
          25.
          Customize User-Agent header sent in HTTP requests by S3A. Sub-task Resolved Chris Nauroth  
          26.
          Add tests to verify that S3A supports SSE-S3 encryption Sub-task Resolved Steve Loughran

          0%

          Original Estimate - 1h
          Remaining Estimate - 1h
          27.
          s3a failures can surface as RTEs, not IOEs Sub-task Resolved Steve Loughran  
          28.
          S3AFileSystem printAmazonServiceException/printAmazonClientException appear copy & paste of AWS examples Sub-task Closed Steve Loughran  
          29.
          S3AFileSystem#toString might throw NullPointerException due to null cannedACL. Sub-task Resolved Chris Nauroth  
          30.
          Enable parallel test execution for hadoop-aws. Sub-task Resolved Chris Nauroth  
          31.
          Add StorageStatistics to S3A; instrument some more operations Sub-task Resolved Steve Loughran  
          32.
          S3A file-create should throw error rather than overwrite directories Sub-task Resolved Steve Loughran  
          33.
          S3A: Support fadvise "random" mode for high performance readPositioned() reads Sub-task Resolved Rajesh Balamohan  
          34.
          S3A listFiles(recursive=true) to do a bulk listObjects instead of walking the pseudo-tree of directories Sub-task Resolved Steve Loughran

          0%

          Original Estimate - 24h
          Remaining Estimate - 24h
          35.
          Consider reducing number of getFileStatus calls in S3AFileSystem.mkdirs Sub-task Resolved Rajesh Balamohan  
          36.
          Isolate test path used by a few S3A tests for more reliable parallel execution. Sub-task Resolved Steve Loughran  
          37.
          s3a initialization against public bucket fails if caller lacks any credentials Sub-task Resolved Chris Nauroth  
          38.
          document s3a better Sub-task Resolved Steve Loughran  
          39.
          Tune S3A provider plugin mechanism Sub-task Resolved Steve Loughran  
          40.
          Provide an option to set the socket buffers in S3AFileSystem Sub-task Resolved Rajesh Balamohan  
          41.
          Read Proxy Password from Credential Providers in S3 FileSystem Sub-task Resolved Larry McCay  
          42.
          mvn fs/s3 test runs to set DNS TTL to 20s Sub-task Resolved Unassigned

          0%

          Original Estimate - 0.5h
          Remaining Estimate - 0.5h
          43.
          set multipart delete timeout to 5 * 60s in S3ATestUtils.createTestFileSystem Sub-task Resolved Steve Loughran  
          44.
          hadoop fs command path doesn't include translation of amazon client exceptions Sub-task Resolved Steve Loughran  
          45.
          add a S3A scale test to do gunzip and linecount Sub-task Resolved Steve Loughran  
          46.
          S3A TemporaryAWSCredentialsProvider to support Hadoop Credential providers for secrets Sub-task Resolved Unassigned  
          47.
          S3A to list InstanceProfileCredentialsProvider after EnvironmentVariableCredentialsProvider Sub-task Resolved Steve Loughran  
          48.
          s3a tests don't authenticate with S3 frankfurt (or other V4 auth only endpoints) Sub-task Resolved Steve Loughran  
          49.
          doc for “fs.s3a.acl.default” indicates incorrect values Sub-task Resolved Shen Yinjie  
          50.
          improve section on troubleshooting s3a auth problems Sub-task Resolved Steve Loughran  
          51.
          explicitly declare the Joda time version S3A depends on Sub-task Resolved Steve Loughran  
          52.
          NPE in S3AFastOutputStream.write Sub-task Resolved Steve Loughran  
          53.
          S3AFileSystem to override getStorageStatistics() and so serve up its statistics Sub-task Resolved Steve Loughran  
          54.
          s3a close() to be non-synchronized, so avoid risk of deadlock on shutdown Sub-task Resolved Steve Loughran  
          55.
          s3:// should have been fully cut off from trunk Sub-task Resolved Mingliang Liu  
          56.
          document object store use with fs shell and distcp Sub-task Resolved Steve Loughran  
          57.
          S3A can provide a more detailed error message when accessing a bucket through an incorrect S3 endpoint. Sub-task Resolved Chris Nauroth  
          58.
          fs.s3a.readahead.range to use getLongBytes Sub-task Resolved Abhishek Modi  
          59.
          hadoop-aws should declare explicit dependency on Jackson 2 jars to prevent classpath conflicts. Sub-task Resolved Chris Nauroth  
          60.
          S3A reporting of file group as empty is harmful to compatibility for the shell. Sub-task Resolved Unassigned  
          61.
          Document S3A known limitations in file ownership and permission model. Sub-task Resolved Chris Nauroth  
          62.
          S3A: Reduce high number of connections to EC2 Instance Metadata Service caused by InstanceProfileCredentialsProvider. Sub-task Resolved Chris Nauroth  
          63.
          switch to Configuration.getLongBytes for byte options Sub-task Resolved Abhishek Modi

          0%

          Original Estimate - 0.5h
          Remaining Estimate - 0.5h
          64.
          S3AFastOutputStream to take ProgressListener in file create() Sub-task Resolved Steve Loughran  
          65.
          S3ABlockOutputStream to pass Yetus & Jenkins Sub-task Resolved Steve Loughran  
          66.
          S3ABlockOutputStream to support huge (many GB) file writes Sub-task Resolved Steve Loughran  
          67.
          Purge some superfluous/obsolete S3 FS tests that are slowing test runs down Sub-task Resolved Steve Loughran  
          68.
          ITestS3AContractRootDir still playing up, bug in eventually() retry logic? Sub-task Resolved Steve Loughran  
          69.
          regression: ITestS3AMiniYarnCluster failing Sub-task Resolved Steve Loughran  
          70.
          Upgrade to AWS SDK 1.11.45 Sub-task Resolved Steve Loughran  
          71.
          Purge superfluous/obsolete S3A Tests Sub-task Resolved Steve Loughran  
          72.
          ITestS3AFileContextStatistics.testStatistics() failing Sub-task Resolved Pieter Reuse  
          73.
          s3a rename: fail if dest file exists Sub-task Resolved Steve Loughran  
          74.
          Fix a couple of the s3a statistic names to be consistent with the rest Sub-task Resolved Steve Loughran  
          75.
          ITestS3AInputStreamPerformance.testTimeToOpenAndReadWholeFileBlocks performance awful Sub-task Resolved Steve Loughran  
          76.
          S3AUtils.translateException to map (wrapped) InterruptedExceptions to InterruptedIOEs Sub-task Resolved Steve Loughran  
          77.
          S3A to track multipart upload count, size, duration Sub-task Resolved Steve Loughran  
          78.
          Improve S3AFastOutputStream memory management Sub-task Resolved Steve Loughran  
          79.
          S3A output streams to share a single LocalDirAllocator for round-robin drive use Sub-task Resolved Steve Loughran  
          80.
          S3A to support per-bucket configuration Sub-task Resolved Steve Loughran  
          81.
          fix some typos in the s3a docs Sub-task Resolved Steve Loughran  
          82.
          S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock) Sub-task Resolved Rajesh Balamohan  

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                41 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 50h
                  50h
                  Remaining:
                  Remaining Estimate - 50h
                  50h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified