Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829

Über-jira: S3A Hadoop 3.3.1 features

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.1
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      Über-jira: S3A features/fixes for Hadoop 3.4

      As usual, this will clutter up with everything which hasn't gone in: don't interpret presence on this list as a commitment to implement.

      And for anyone wanting to add patches

      MUST

      1. reviews via github PRs
      2. no declaration of AWS S3 endpoint (or other S3 impl) -no review

      SHOULD

      1. have a setup for testing SSE-KMS, DDB/S3Guard
      2. including an assumed role we can use for AssumedRole Delegation Tokens

      If you are going near those bits of code, they uprate from SHOULD to MUST.

        Attachments

          Issue Links

          1.
          make sure staging committers collect DTs for the staging FS Sub-task Resolved Unassigned  
          2.
          s3a test can hang in teardown with network problems Sub-task Resolved Unassigned  
          3.
          distcp -update to S3A; abfs, etc always overwrites due to block size mismatch Sub-task Resolved Steve Loughran  
          4.
          S3AFileSystem.getContentSummary() to use listFiles(recursive) Sub-task Resolved Unassigned  
          5.
          add extra S3A MPU test to see what happens if a file is created during the MPU Sub-task Resolved Steve Loughran  
          6.
          S3A doesn't actually verify paths have the correct authority Sub-task Resolved Steve Loughran  
          7.
          Use error code detail in AWS server responses for finer grained exceptions Sub-task Resolved Unassigned  
          8.
          initial part uploads seem to block unnecessarily in S3ABlockOutputStream Sub-task Resolved Steven Rand  
          9.
          S3AFilesystem.initiateRename() can skip check on dest.parent status if src has same parent Sub-task Resolved Steve Loughran  
          10.
          Collect AwsSdkMetrics in S3A FileSystem IOStatistics Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 10m
          11.
          Add some tests about S3 timestamp tracking Sub-task Resolved Unassigned  
          12.
          Optimize uses of FS operations in the ASF analysis frameworks and libraries Sub-task Resolved Steve Loughran  
          13.
          log DNS addresses on s3a init Sub-task Resolved Mukund Thakur  
          14.
          export s3a BlockingThreadPoolExecutorService pool info (size, load) as gauges Sub-task Resolved Unassigned  
          15.
          S3A authenticators to log origin of .secret.key options Sub-task Resolved Unassigned  
          16.
          Retrive modtime of PUT file from store, via response or HEAD Sub-task Resolved Unassigned  
          17.
          s3a create() doesn't check for an ancestor path being a file Sub-task Resolved Sean Mackrory  
          18.
          Improve S3A rename resilience Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h
          19.
          add tests/docs for HAR files on s3a Sub-task Resolved Unassigned  
          20.
          add experimental optimization of s3a directory marker handling Sub-task Resolved Unassigned  
          21.
          Report on S3A cached 404 recovery better Sub-task Resolved Unassigned  
          22.
          Stabilize openFile() and adopt internally Sub-task In Progress Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 10h 40m
          23.
          s3a to instrument duration of HTTP calls Sub-task Resolved Steve Loughran  
          24.
          catch and downgrade all exceptions trying to load openssl native libs through wildfly Sub-task Resolved Steve Loughran  
          25.
          Optimize getFileStatus in S3A Sub-task Resolved Steven K. Wong  
          26.
          s3a mkdir path/ can add 404 to S3 load balancers Sub-task Resolved Unassigned  
          27.
          AmazonClient 30x exceptions to include redirect URL in message Sub-task Resolved Unassigned  
          28.
          S3A Input Stream bytes read counter isn't getting through to StorageStatistics/instrumentation properly Sub-task Resolved Unassigned  
          29.
          job commit failure in S3A MR magic committer test Sub-task Resolved Steve Loughran  
          30.
          intermittent failure of ITestS3GuardListConsistency.testInconsistentS3ClientDeletes in parallel runs Sub-task Resolved Unassigned  
          31.
          S3A Client to add explicit support for versioned stores Sub-task Resolved Steve Loughran  
          32.
          S3A statistic collection underrecords bytes written in helper threads Sub-task Resolved Steve Loughran  
          33.
          s3a: auto-detect region for bucket and use right endpoint Sub-task Resolved Aaron Fabbri  
          34.
          s3guard LimitExceededException -too many tables Sub-task Resolved Unassigned  
          35.
          s3a directory housekeeping operations to be done in async thread Sub-task Resolved Unassigned  
          36.
          ITestS3ARemoteFileChanged tests fail if you set the bucket to versionid tracking Sub-task Resolved Unassigned  
          37.
          Add some Java-8 friendly way to work with RemoteIterable, especially listings Sub-task Resolved Unassigned  
          38.
          S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes Sub-task Resolved Kazuyuki Tanimura  
          39.
          s3a mkdirs() to not check dest for a dir marker Sub-task Resolved Unassigned  
          40.
          S3A FS deleteOnExit to skip the exists check Sub-task Resolved Unassigned  
          41.
          ITestAssumeRole.testAssumeRoleBadInnerAuth failure Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 50m
          42.
          S3A to optionally retain directory markers Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          43.
          Add some Abortable.abort() interface for streams etc which can be terminated Sub-task Resolved Jungtaek Lim

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h
          44.
          IAM role created by S3A DT doesn't include DynamoDB scan Sub-task Resolved Unassigned  
          45.
          S3A FullCredentialsTokenBinding fails if local credentials are unset Sub-task Resolved Steve Loughran  
          46.
          s3guard can't init table if caller doesn't have tag permissions Sub-task Resolved Unassigned  
          47.
          ITestS3GuardOutOfBandOperations.testListingDelete failing on versioned bucket Sub-task Resolved Steve Loughran  
          48.
          Add option for a prefix to put in front of every s3guard table Sub-task Resolved Unassigned  
          49.
          Add more s3guard metrics Sub-task Resolved Unassigned  
          50.
          Improve DynamoDB schema update story Sub-task Resolved Sean Mackrory  
          51.
          S3Guard: Optimize performance of handling OOB operations in non-authoritative mode Sub-task Resolved Unassigned  
          52.
          getFileChecksum() needs to adopt S3Guard Sub-task Resolved lqjacklee  
          53.
          reduce/tune read failure fault injection on inconsistent client Sub-task Resolved Unassigned  
          54.
          intermittent failure of ITestCommitOperations: too many s3guard writes Sub-task Resolved Unassigned  
          55.
          S3a getFileStatus to update DDB if an S3 query returns etag/versionID Sub-task Resolved Unassigned  
          56.
          Possible for modified configuration to leak into metadatastore in S3GuardTool Sub-task Resolved Unassigned  
          57.
          S3Guard instrumentation to include cost of DynamoDB ops as metric Sub-task Resolved Unassigned  
          58.
          Intermittent failure of ITestS3GuardConcurrentOps#testConcurrentTableCreations Sub-task Resolved Unassigned  
          59.
          Intermittent failure of ITestS3GuardToolDynamoDB#testDynamoDBInitDestroyCycle Sub-task Resolved Unassigned  
          60.
          S3Guard init command uses global settings, not those of target bucket Sub-task Resolved Steve Loughran  
          61.
          Improve throttling on S3Guard DDB batch retries Sub-task Resolved Unassigned  
          62.
          S3guard: add inconsistency detection metrics Sub-task Resolved Unassigned  
          63.
          S3Guard prune to only remove auth dir marker if files (not tombstones) are removed Sub-task Resolved Unassigned  
          64.
          ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store Sub-task Resolved Unassigned  
          65.
          tag S3GuardTool entry points as limitedPrivate("management-tools")/evolving Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          66.
          Fix ITestS3GuardToolLocal#testInitNegativeRead test failure Sub-task Resolved Steve Loughran  
          67.
          Ensure controls in-place to prevent clients with significant clock skews pruning aggressively Sub-task Resolved Unassigned  
          68.
          Scheme assertion in S3Guard DynamoDBMetadataStore::checkPath is unnecessarily restrictive Sub-task Resolved Unassigned  
          69.
          improvements to S3GuardTool destroy command Sub-task Resolved Unassigned  
          70.
          Clock skew can cause S3Guard to think object metadata is out of date Sub-task Resolved Unassigned  
          71.
          S3guard metadata stores to support millions of entries Sub-task Resolved Unassigned  
          72.
          ITestS3GuardToolDynamoDB.testDynamoDBInitDestroyCycle fails if test bucket isn't on demand Sub-task Resolved Steve Loughran  
          73.
          S3Guard to self update on directory listings of S3 Sub-task Resolved Unassigned  
          74.
          S3guard mistakes root URI without / as non-absolute path Sub-task Resolved Unassigned  
          75.
          HADOOP-16953. tune s3guard disabled warnings Sub-task Resolved Steve Loughran  
          76.
          mkdir on s3a should not be sensitive to trailing '/' Sub-task Resolved Steve Loughran  
          77.
          s3a to not need wildfly on the classpath Sub-task Resolved Steve Loughran  
          78.
          ITestS3AConfiguration proxy tests fail when bucket probes == 0 Sub-task Resolved Gabor Bota  
          79.
          S3A to support additional token issuers Sub-task Resolved Steve Loughran  
          80.
          S3A staging committer committing duplicate files Sub-task Resolved Steve Loughran  
          81.
          ITestS3GuardOutOfBandOperations testListingDelete[auth=false] fails on unversioned bucket Sub-task Resolved Unassigned  
          82.
          S3AFileSystem silently deletes "fake" directories when writing a file. Sub-task Resolved Unassigned  
          83.
          ITestS3AEncryptionWithDefaultS3Settings fails if default bucket encryption != KMS Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          84.
          Handle transient stream read failures in FileSystem contract tests Sub-task Resolved Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 10m
          85.
          add way for s3a to recognise buckets with "." in name and switch to path access Sub-task Resolved Unassigned  
          86.
          Decrease size of s3a dependencies Sub-task Resolved Unassigned  
          87.
          Backport HADOOP-13230 list/getFileStatus changes for preserved directory markers Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          88.
          Renaming a file under a sibling empty directory doesn't delete dest dir's marker Sub-task Resolved Steve Loughran  
          89.
          improve s3guard markers command line tool Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 10m
          90.
          Backport HADOOP-13230 listing changes for preserved directory markers to 3.1.x Sub-task Resolved Steve Loughran  
          91.
          HADOOP-17244. S3A directory delete tombstones dir markers prematurely. Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 40m
          92.
          s3a rename() now requires s3:deleteObjectVersion permission Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 20m
          93.
          S3A to always probe S3 in S3A getFileStatus on non-auth paths Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 40m
          94.
          ITestCustomSigner fails with gcs s3 compatible endpoint. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          95.
          S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions Sub-task Resolved Yongjun Zhang  
          96.
          FileSystem.get to support slow-to-instantiate FS clients Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 50m
          97.
          S3A committer to support concurrent jobs with same app attempt ID & dest dir Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 8h 10m
          98.
          S3A marker tool mixes up -min and -max Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          99.
          ITestS3AContractRename failing against stricter tests Sub-task Resolved Attila Doroszlai

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          100.
          AbstractS3ATokenIdentifier to set issue date == now Sub-task Resolved Jungtaek Lim

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          101.
          Upgrade aws-java-sdk to 1.11.901 Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h
          102.
          ITestS3ADeleteCost.testDirMarkersFileCreation failure Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          103.
          AbstractS3ATokenIdentifier to issue date in UTC Sub-task Resolved Jungtaek Lim

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          104.
          enable s3a magic committer by default Sub-task Resolved Steve Loughran  
          105.
          Magic committer files don't have the count of bytes written collected by spark Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 12h 20m
          106.
          Intermittent S3AInputStream failures: Premature end of Content-Length delimited message body etc Sub-task Resolved Yongjun Zhang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 20m
          107.
          S3A docs to state s3 is consistent, deprecate S3Guard Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          108.
          magic committer to be enabled for all S3 buckets Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3.5h
          109.
          typo in MagicCommitTracker Sub-task Resolved Pierrick HYMBERT

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          110.
          Add option to downgrade S3A rejection of Syncable to warning Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          111.
          whitespace not allowed in paths when saving files to s3a via committer Sub-task Resolved Krzysztof Adamski

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          112.
          Possible NPE in S3A MultiObjectDeleteSupport error handling Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 101h
                  101h