Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829

Über-jira: S3A Hadoop 3.4 features

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      Über-jira: S3A features/fixes for Hadoop 3.4

      As usual, this will clutter up with everything which hasn't gone in: don't interpret presence on this list as a commitment to implement.

      And for anyone wanting to add patches

      MUST

      1. reviews via github PRs
      2. no declaration of AWS S3 endpoint (or other S3 impl) -no review

      SHOULD

      1. have a setup for testing SSE-KMS, DDB/S3Guard
      2. including an assumed role we can use for AssumedRole Delegation Tokens

      If you are going near those bits of code, they uprate from SHOULD to MUST.

        Attachments

          Issue Links

          1.
          S3AInputStream logging to make it easier to debug file leakage Sub-task Open Unassigned  
          2.
          Filesystem discovery to stop loading implementation classes Sub-task Open Unassigned  
          3.
          s3a rename failed during copy, "Unable to copy part" + 200 error code Sub-task Open Unassigned  
          4.
          make sure staging committers collect DTs for the staging FS Sub-task Open Unassigned  
          5.
          Remove transient dependency on hadoop-hdfs-client Sub-task Open Unassigned  
          6.
          s3a directory housekeeping operations to be done in async thread Sub-task Open Unassigned  
          7.
          S3A input stream to support ByteBufferReadable Sub-task Open Unassigned  
          8.
          Encrypt S3A buffered data on disk Sub-task Open Unassigned  
          9.
          S3a: Failed to reset the request input stream/make S3A uploadPart() retriable Sub-task Open Unassigned  
          10.
          support git-secrets commit hook to keep AWS secrets out of git Sub-task Patch Available Steve Loughran  
          11.
          Tune hadoop-aws parallel test surefire/failsafe settings Sub-task Open Unassigned  
          12.
          S3A Secret access to fall back to XML if credential provider raises IOE. Sub-task Open Unassigned  
          13.
          Use error code detail in AWS server responses for finer grained exceptions Sub-task Open Unassigned  
          14.
          Test MR split optimisation with recursive listing Sub-task Open Unassigned  
          15.
          initial part uploads seem to block unnecessarily in S3ABlockOutputStream Sub-task Open Steven Rand  
          16.
          Speed up S3A test runs Sub-task Open Unassigned  
          17.
          S3A copy/rename of large files to be parallelized as a multipart operation Sub-task Open Unassigned  
          18.
          Test hadoop fs shell against s3a; fix problems Sub-task Open Unassigned  
          19.
          s3guard bucket-info command to add a verify-property <key>=<value> <bucket> Sub-task Open Unassigned  
          20.
          S3A to implement rename(final Path src, final Path dst, final Rename... options) Sub-task Open Unassigned  
          21.
          Collect AwsSdkMetrics in S3A FileSystem IOStatistics Sub-task In Progress Steve Loughran  
          22.
          Add some tests about S3 timestamp tracking Sub-task Open Unassigned  
          23.
          S3a operations keep retrying if the password is wrong Sub-task Open Thomas Poepping  
          24.
          Add some S3A-specific create file options Sub-task Open Unassigned  
          25.
          s3a to set fake directory marker contentType to application/x-directory Sub-task Open Steve Loughran  
          26.
          Report problems w/ local S3A buffer directory meaningfully Sub-task Open Unassigned  
          27.
          S3A Filesystem does not check return from AmazonS3Client deleteObjects Sub-task Open Unassigned  
          28.
          Optimize uses of FS operations in the ASF analysis frameworks and libraries Sub-task Open Steve Loughran  
          29.
          Possible NPE in S3A MultiObjectDeleteSupport error handling Sub-task Open Steve Loughran  
          30.
          shell rm command to not rename to ~/.Trash in object stores Sub-task Open Unassigned  
          31.
          Add S3A support for Async Scatter/Gather IO Sub-task Open Gabor Bota  
          32.
          Impersonate hosts in s3a for better data locality handling Sub-task Open Thomas Demoor  
          33.
          export s3a BlockingThreadPoolExecutorService pool info (size, load) as gauges Sub-task Open Unassigned  
          34.
          increase the default number of threads and http connections in S3A Sub-task Open Unassigned  
          35.
          Support AWS S3 reduced redundancy storage class Sub-task Open Unassigned  
          36.
          s3a doesn't consider blobs with trailing / and content-length >0 as directories Sub-task Open Unassigned  
          37.
          make s3a read fault injection configurable including "off" Sub-task Open Unassigned  
          38.
          strip s3.amazonaws.com off hostnames before making s3a calls Sub-task Open Unassigned  
          39.
          add a special 0 byte input stream for empty blobs Sub-task Open Unassigned  
          40.
          improve setting of max connections in AWS client Sub-task Open Unassigned  
          41.
          multipart/huge file upload tests to look at checksums returned Sub-task Open Unassigned  
          42.
          Some S3A tests leak filesystem instances Sub-task Open Unassigned  
          43.
          s3guard to provide better diags on ddb init failures Sub-task Open Unassigned  
          44.
          Improve isolation of FS instances in S3A committer tests Sub-task Open Unassigned  
          45.
          s3a rm on the CLI generates deprecation warning on io.bytes.per.checksum Sub-task Open Unassigned  
          46.
          S3A getContentSummary() to move to listFiles(recursive) to count children; instrument use Sub-task Open Unassigned  
          47.
          review S3A translateException translation matches IBM CORS spec Sub-task Open Unassigned  
          48.
          Add fs.s3a.rename.raises.exceptions to raise exceptions on rename failures Sub-task Open Unassigned  
          49.
          add tests/docs for HAR files on s3a Sub-task Open Unassigned  
          50.
          S3A to use a thread pool for async path operations Sub-task Open Unassigned  
          51.
          builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate upload Sub-task Open Unassigned  
          52.
          FileSystem/s3a processDeleteOnExit to skip the exists() check Sub-task Open Unassigned  
          53.
          Add custom InstanceProfileCredentialsProvider with more resilience to throttling Sub-task Open Unassigned  
          54.
          ITestS3AMiniYarnCluster fails on sequential runs with Kerberos error Sub-task Open Unassigned  
          55.
          S3a DelegationToken bindings to to support a "correlation ID" for the UA header Sub-task Open Unassigned  
          56.
          Stabilize openFile() and adopt internally Sub-task In Progress Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 9h 10m
          57.
          S3AInputStream.skip() to use lazy seek Sub-task Open Unassigned  
          58.
          Document `dynamodb:TagResource` an explicit client-side permission for S3Guard Sub-task Open Gabor Bota  
          59.
          ITestS3ARemoteFileChanged doesn't overwrite test data creation Sub-task Open Unassigned  
          60.
          AWS Data read stack trace in S3a putObjectDirect Sub-task Open Unassigned  
          61.
          AmazonClient 30x exceptions to include redirect URL in message Sub-task Open Unassigned  
          62.
          S3A FS to add "s3a:no-existence-checks" to the builder file creation option set Sub-task Open Unassigned  
          63.
          s3a new getdefaultblocksize be called in getFileStatus which has not been implemented in s3afilesystem yet Sub-task Open Unassigned  
          64.
          Use lighter-weight alternatives to innerGetFileStatus where possible Sub-task Open Unassigned  
          65.
          intermittent failure of ITestS3GuardListConsistency.testInconsistentS3ClientDeletes in parallel runs Sub-task Open Unassigned  
          66.
          ITestDynamoDBMetadataStore.testTableVersioning failure -DDB deleteItem consistency? Sub-task Open Unassigned  
          67.
          Add S3AWriteOpContext for write ops; pass in statistics and other settings Sub-task Open Unassigned  
          68.
          S3A: Set thread names with more specific information about the call. Sub-task Open Unassigned  
          69.
          Add AWS S3 Transfer acceleration support Sub-task Open Unassigned  
          70.
          Add a way for an FS instance to say "really, no trash interval at all" Sub-task Open Unassigned  
          71.
          s3guard uploads command to list date and initiator of outstanding uploads Sub-task Open Unassigned  
          72.
          test and document use of fs.s3a.signing-algorithm Sub-task Open Unassigned  
          73.
          S3AInputStream.seek should throw EOFException if seeking past the end of file Sub-task Open Unassigned  
          74.
          S3a auth exception to link to a wiki page on the problem Sub-task Open Unassigned  
          75.
          s3a to improve diags on s3a bad request message Sub-task Open Unassigned  
          76.
          AbstractContractDistCpTest to test attr preservation with -p, verify blobstores downgrade Sub-task Open Steve Loughran  
          77.
          ITestS3AContractRootDir failure on non-S3Guarded bucket Sub-task Open Unassigned  
          78.
          s3guard bucket-info command to include default bucket encryption info Sub-task Open Unassigned  
          79.
          S3 SSEC tests to downgrade when running against a mandatory encryption object store Sub-task Open Unassigned  
          80.
          S3AInputStream read(bytes[]) to not retry on read failure: pass action up Sub-task Open Unassigned  
          81.
          typo in TestNeworkBinding Sub-task Open Steve Loughran  
          82.
          ITestS3A select tests fail if user kinited in Sub-task Open Unassigned  
          83.
          S3A add histogram metrics types for latency, etc. Sub-task Open Sean Mackrory  
          84.
          cherry pick s3 ehancements from PrestoS3FileSystem Sub-task Open Unassigned  
          85.
          S3aDelegationTokens to add accessor for tests to get at the token binding Sub-task Open Unassigned  
          86.
          s3guard LimitExceededException -too many tables Sub-task Open Unassigned  
          87.
          Support multipart download in S3AFileSystem Sub-task Open Unassigned  
          88.
          NPE in S3AInputStream.read() in ITestS3AInconsistency.testOpenFailOnRead Sub-task Open Unassigned  
          89.
          S3A DT marshalling to include nested error text in wrapped message Sub-task Open Unassigned  
          90.
          S3 Select Exceptions are not being converted to IOEs Sub-task Open Unassigned  
          91.
          remove misleading fs.s3a.delegation.tokens.enabled prompt Sub-task Open Unassigned  
          92.
          S3A to support configuring various AWS S3 client extended options Sub-task Open Unassigned  
          93.
          Review S3A documentation to make sure it is consistent with the current codebase Sub-task Open Unassigned  
          94.
          S3A DT support to warn when loading expired token Sub-task Open Steve Loughran  
          95.
          ITestS3AAWSCredentialsProvider tests fail if a bucket has DTs enabled Sub-task Open Unassigned  
          96.
          Clarify committers.md around v2 failure handling Sub-task Open Unassigned  
          97.
          S3AFileStatus to add a serialVersionUID; review & test serialization Sub-task Open Unassigned  
          98.
          test YARN log collection works to s3a Sub-task Open Unassigned  
          99.
          Encrypt S3A data client-side with AWS SDK (S3-CSE) Sub-task Patch Available Igor Mazur  
          100.
          S3A can support short user-friendly aliases for configuration of credential providers. Sub-task Open Unassigned  
          101.
          Handle S3A "glacier" data Sub-task Open Unassigned  
          102.
          Add common getFileBlockLocations() emulation for object stores, including S3A Sub-task Patch Available Steve Loughran  
          103.
          S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes Sub-task Patch Available Kazuyuki Tanimura  
          104.
          S3A to support Requester Pays Buckets Sub-task Patch Available Mandus Momberg

          0%

          Original Estimate - 2h
          Remaining Estimate - 2h
          105.
          clean up ITestS3AFileSystemContract Sub-task Patch Available Unassigned  
          106.
          ITestS3AContractGetFileStatusV1List may have consistency issues Sub-task Open Unassigned  
          107.
          S3ARetryPolicy to handle AWS 500 responses/error code TooBusyException with the throttle backoff policy Sub-task Open Unassigned  
          108.
          Add some Abortable.abort() interface for streams etc which can be terminated Sub-task Open Unassigned  
          109.
          S3A mkdirs to indicate which parent path element refers to a file Sub-task Open Unassigned  
          110.
          NPE in s3a byte buffer block upload Sub-task Open Unassigned  
          111.
          declare that fs.s3a.ext. is a prefix for arbitrary extensions Sub-task Open Unassigned  
          112.
          AWS AssumedRoleCredentialProvider needs ExternalId add Sub-task Open Unassigned  
          113.
          IAM role created by S3A DT doesn't include DynamoDB scan Sub-task Open Unassigned  
          114.
          S3AFilesystem trash handling should respect the current UGI Sub-task Open Unassigned  
          115.
          S3AInputStream.remainingInFile should use nextReadPos Sub-task Reopened lqjacklee  
          116.
          s3guard can't init table if caller doesn't have tag permissions Sub-task Open Unassigned  
          117.
          ITestS3AContractSeek teardown closes test FS before superclass can do its cleanup Sub-task Open Unassigned  
          118.
          Possible inconsistent state of AbstractDelegationTokenSecretManager Sub-task Patch Available Hankó Gergely  
          119.
          ITestCustomSigner uses absolute paths off the bucket root rather than fork-relative Sub-task Open Unassigned  
          120.
          log accepted/rejected fs.s3a.authoritative.path paths @ debug Sub-task Open Unassigned  
          121.
          Add option for a prefix to put in front of every s3guard table Sub-task Open Unassigned  
          122.
          Add more s3guard metrics Sub-task Open Unassigned  
          123.
          Improve DynamoDB schema update story Sub-task Open Sean Mackrory  
          124.
          S3Guard: Optimize performance of handling OOB operations in non-authoritative mode Sub-task Open Unassigned  
          125.
          increase performance of s3guard import command Sub-task Open Unassigned  
          126.
          intermittent failure of ITestCommitOperations: too many s3guard writes Sub-task Open Unassigned  
          127.
          Possible for modified configuration to leak into metadatastore in S3GuardTool Sub-task Open Unassigned  
          128.
          S3AFileSystem copyFile to propagate etag/version from getObjectMetadata to copy request Sub-task Open Unassigned  
          129.
          Intermittent failure of ITestS3GuardConcurrentOps#testConcurrentTableCreations Sub-task Open Unassigned  
          130.
          S3Guard init command uses global settings, not those of target bucket Sub-task Reopened Steve Loughran  
          131.
          S3guard: add inconsistency detection metrics Sub-task Open Unassigned  
          132.
          S3Guard prune to only remove auth dir marker if files (not tombstones) are removed Sub-task Open Unassigned  
          133.
          Fix ITestS3GuardToolLocal#testInitNegativeRead test failure Sub-task Open Steve Loughran  
          134.
          Ensure controls in-place to prevent clients with significant clock skews pruning aggressively Sub-task Open Unassigned  
          135.
          Scheme assertion in S3Guard DynamoDBMetadataStore::checkPath is unnecessarily restrictive Sub-task Open Unassigned  
          136.
          improvements to S3GuardTool destroy command Sub-task Open Unassigned  
          137.
          ITestS3GuardToolDynamoDB.testDynamoDBInitDestroyCycle fails if test bucket isn't on demand Sub-task Open Steve Loughran  
          138.
          S3Guard to self update on directory listings of S3 Sub-task Open Unassigned  
          139.
          S3guard mistakes root URI without / as non-absolute path Sub-task Open Unassigned  
          140.
          S3A deleteObjects hanging/retrying forever Sub-task Open Unassigned  
          141.
          S3A delegation token binding to support secondary binding list Sub-task In Progress Steve Loughran  
          142.
          whitespace not allowed in paths when saving files to s3a via committer Sub-task Open Unassigned  
          143.
          Re-enable optimized copyFromLocal implementation in S3AFileSystem Sub-task Open Unassigned  
          144.
          ITestS3GuardOutOfBandOperations testListingDelete[auth=false] fails on unversioned bucket Sub-task Open Unassigned  
          145.
          S3A client retries on SSL Auth exceptions triggered by "." bucket names Sub-task Open Unassigned  
          146.
          Support S3 Access Points Sub-task Open Unassigned  
          147.
          ITestS3AConfiguration.testProxyConnection failing when s3a bucket probe disabled Sub-task Open Unassigned  
          148.
          Failure of ITestAssumeRole.testRestrictedCommitActions Sub-task Open Steve Loughran  
          149.
          S3A (async) ObjectListingIterator to block in hasNext() for results Sub-task Open Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          150.
          Distcp to set S3 Storage Class Sub-task Open Unassigned

          0%

          Original Estimate - 168h
          Remaining Estimate - 168h
          151.
          transient ITestS3AFileContextStatistics failure -read buffer not filled Sub-task Open Unassigned  
          152.
          hadoop-cloud-storage transient dependencies need review Sub-task Open Unassigned  
          153.
          S3A AWS Credential provider loading gets confused with isolated classloaders Sub-task Open Unassigned  
          154.
          ITestS3AContractSeek.teardown closes FS before superclass does its cleanup Sub-task Open Unassigned  
          155.
          fs.s3a.buffer.dir to be under Yarn container path on yarn applications Sub-task Open Unassigned  
          156.
          S3A ITestPartialRenamesDeletes.testRenameDirFailsInDelete failure: missing directory marker Sub-task Reopened Steve Loughran  
          157.
          enable s3a magic committer by default Sub-task Open Steve Loughran  
          158.
          GCS to support per-bucket configuration Sub-task Open Unassigned  
          159.
          Use S3 content-range header to update length of an object during reads Sub-task Open Unassigned  
          160.
          S3A to treat "SdkClientException: Data read has a different length than the expected" as EOFException Sub-task Open Unassigned  
          161.
          ITestAssumeRole.testAssumeRoleBadInnerAuth failure Sub-task Open Unassigned  
          162.
          magic committer to be enabled for all S3 buckets Sub-task Open Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 20m
          163.
          Add a MkdirOperation for chained S3 operations during mkdir Sub-task Open Unassigned  

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - 170h Original Estimate - 170h
                  170h
                  Remaining:
                  Time Spent - 76h 50m Remaining Estimate - 170h
                  170h
                  Logged:
                  Time Spent - 76h 50m Remaining Estimate - 170h
                  76h 50m