Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829

Über-jira: S3A Hadoop 3.4 features

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      Über-jira: S3A features/fixes for Hadoop 3.4

      As usual, this will clutter up with everything which hasn't gone in: don't interpret presence on this list as a commitment to implement.

      And for anyone wanting to add patches

      MUST

      1. reviews via github PRs
      2. no declaration of AWS S3 endpoint (or other S3 impl) -no review

      SHOULD

      1. have a setup for testing SSE-KMS, DDB/S3Guard
      2. including an assumed role we can use for AssumedRole Delegation Tokens

      If you are going near those bits of code, they uprate from SHOULD to MUST.

        Attachments

          Issue Links

          1.
          S3AInputStream logging to make it easier to debug file leakage Sub-task Open Unassigned  
          2.
          Filesystem discovery to stop loading implementation classes Sub-task Open Unassigned  
          3.
          s3a rename failed during copy, "Unable to copy part" + 200 error code Sub-task Open Unassigned  
          4.
          make sure staging committers collect DTs for the staging FS Sub-task Open Unassigned  
          5.
          Remove transient dependency on hadoop-hdfs-client Sub-task Open Unassigned  
          6.
          s3a directory housekeeping operations to be done in async thread Sub-task Open Unassigned  
          7.
          S3A input stream to support ByteBufferReadable Sub-task Open Unassigned  
          8.
          Encrypt S3A buffered data on disk Sub-task Open Unassigned  
          9.
          S3a: Failed to reset the request input stream/make S3A uploadPart() retriable Sub-task Open Unassigned  
          10.
          S3AFileSystem.getContentSummary() to use listFiles(recursive) Sub-task Open Unassigned  
          11.
          support git-secrets commit hook to keep AWS secrets out of git Sub-task Patch Available Steve Loughran  
          12.
          Tune hadoop-aws parallel test surefire/failsafe settings Sub-task Open Unassigned  
          13.
          S3A Secret access to fall back to XML if credential provider raises IOE. Sub-task Open Unassigned  
          14.
          Use error code detail in AWS server responses for finer grained exceptions Sub-task Open Unassigned  
          15.
          Test MR split optimisation with recursive listing Sub-task Open Unassigned  
          16.
          initial part uploads seem to block unnecessarily in S3ABlockOutputStream Sub-task Open Steven Rand  
          17.
          Speed up S3A test runs Sub-task Open Unassigned  
          18.
          S3AFilesystem.initiateRename() can skip check on dest.parent status if src has same parent Sub-task In Progress Steve Loughran  
          19.
          S3A copy/rename of large files to be parallelized as a multipart operation Sub-task Open Unassigned  
          20.
          Test hadoop fs shell against s3a; fix problems Sub-task Open Unassigned  
          21.
          s3guard bucket-info command to add a verify-property <key>=<value> <bucket> Sub-task Open Unassigned  
          22.
          S3A to implement rename(final Path src, final Path dst, final Rename... options) Sub-task Open Unassigned  
          23.
          hook up AwsSdkMetrics to hadoop metrics Sub-task In Progress Steve Loughran  
          24.
          Add some tests about S3 timestamp tracking Sub-task Open Unassigned  
          25.
          S3a operations keep retrying if the password is wrong Sub-task Open Thomas Poepping  
          26.
          Add some S3A-specific create file options Sub-task Open Unassigned  
          27.
          s3a to set fake directory marker contentType to application/x-directory Sub-task Open Steve Loughran  
          28.
          Report problems w/ local S3A buffer directory meaningfully Sub-task Open Unassigned  
          29.
          S3A Filesystem does not check return from AmazonS3Client deleteObjects Sub-task Open Unassigned  
          30.
          Optimize uses of FS operations in the ASF analysis frameworks and libraries Sub-task Open Steve Loughran  
          31.
          Possible NPE in S3A MultiObjectDeleteSupport error handling Sub-task Open Steve Loughran  
          32.
          shell rm command to not rename to ~/.Trash in object stores Sub-task Open Unassigned  
          33.
          Add S3A support for Async Scatter/Gather IO Sub-task Open Gabor Bota  
          34.
          Impersonate hosts in s3a for better data locality handling Sub-task Open Thomas Demoor  
          35.
          export s3a BlockingThreadPoolExecutorService pool info (size, load) as metrics Sub-task Open Unassigned  
          36.
          increase the default number of threads and http connections in S3A Sub-task Open Unassigned  
          37.
          Support AWS S3 reduced redundancy storage class Sub-task Open Unassigned  
          38.
          s3a doesn't consider blobs with trailing / and content-length >0 as directories Sub-task Open Unassigned  
          39.
          make s3a read fault injection configurable including "off" Sub-task Open Unassigned  
          40.
          s3a create() doesn't check for an ancestor path being a file Sub-task Open Sean Mackrory  
          41.
          strip s3.amazonaws.com off hostnames before making s3a calls Sub-task Open Unassigned  
          42.
          add a special 0 byte input stream for empty blobs Sub-task Open Unassigned  
          43.
          improve setting of max connections in AWS client Sub-task Open Unassigned  
          44.
          multipart/huge file upload tests to look at checksums returned Sub-task Open Unassigned  
          45.
          Some S3A tests leak filesystem instances Sub-task Open Unassigned  
          46.
          s3guard to provide better diags on ddb init failures Sub-task Open Unassigned  
          47.
          Improve isolation of FS instances in S3A committer tests Sub-task Open Unassigned  
          48.
          s3a rm on the CLI generates deprecation warning on io.bytes.per.checksum Sub-task Open Unassigned  
          49.
          S3A getContentSummary() to move to listFiles(recursive) to count children; instrument use Sub-task Open Unassigned  
          50.
          review S3A translateException translation matches IBM CORS spec Sub-task Open Unassigned  
          51.
          Add fs.s3a.rename.raises.exceptions to raise exceptions on rename failures Sub-task Open Unassigned  
          52.
          add tests/docs for HAR files on s3a Sub-task Open Unassigned  
          53.
          S3A to use a thread pool for async path operations Sub-task Open Unassigned  
          54.
          builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate upload Sub-task Open Unassigned  
          55.
          FileSystem/s3a processDeleteOnExit to skip the exists() check Sub-task Open Unassigned  
          56.
          Add custom InstanceProfileCredentialsProvider with more resilience to throttling Sub-task Open Unassigned  
          57.
          ITestS3AMiniYarnCluster fails on sequential runs with Kerberos error Sub-task Open Unassigned  
          58.
          S3a DelegationToken bindings to to support a "correlation ID" for the UA header Sub-task Open Unassigned  
          59.
          Report on S3A cached 404 recovery better Sub-task Open Unassigned  
          60.
          Enhance S3A openFile() Sub-task In Progress Steve Loughran  
          61.
          S3AInputStream.skip() to use lazy seek Sub-task Open Unassigned  
          62.
          s3a to instrument duration of HTTP calls Sub-task Open Unassigned  
          63.
          Document `dynamodb:TagResource` an explicit client-side permission for S3Guard Sub-task Open Gabor Bota  
          64.
          ITestS3ARemoteFileChanged doesn't overwrite test data creation Sub-task Open Unassigned  
          65.
          Optimize getFileStatus in S3A Sub-task Open Steven K. Wong  
          66.
          s3a mkdir path/ can add 404 to S3 load balancers Sub-task Open Unassigned  
          67.
          AWS Data read stack trace in S3a putObjectDirect Sub-task Open Unassigned  
          68.
          AmazonClient 30x exceptions to include redirect URL in message Sub-task Open Unassigned  
          69.
          S3A FS to add "s3a:no-existence-checks" to the builder file creation option set Sub-task Open Unassigned  
          70.
          s3a new getdefaultblocksize be called in getFileStatus which has not been implemented in s3afilesystem yet Sub-task Open Unassigned  
          71.
          Use lighter-weight alternatives to innerGetFileStatus where possible Sub-task Open Unassigned  
          72.
          S3A Input Stream bytes read counter isn't getting through to StorageStatistics/instrumentation properly Sub-task Open Unassigned  
          73.
          intermittent failure of ITestS3GuardListConsistency.testInconsistentS3ClientDeletes in parallel runs Sub-task Open Unassigned  
          74.
          ITestDynamoDBMetadataStore.testTableVersioning failure -DDB deleteItem consistency? Sub-task Open Unassigned  
          75.
          Add S3AWriteOpContext for write ops; pass in statistics and other settings Sub-task Open Unassigned  
          76.
          S3A: Set thread names with more specific information about the call. Sub-task Open Unassigned  
          77.
          Add AWS S3 Transfer acceleration support Sub-task Open Unassigned  
          78.
          Add a way for an FS instance to say "really, no trash interval at all" Sub-task Open Unassigned  
          79.
          s3guard uploads command to list date and initiator of outstanding uploads Sub-task Open Unassigned  
          80.
          test and document use of fs.s3a.signing-algorithm Sub-task Open Unassigned  
          81.
          S3A statistic collection underrecords bytes written in helper threads Sub-task In Progress Steve Loughran  
          82.
          S3AInputStream.seek should throw EOFException if seeking past the end of file Sub-task Open Unassigned  
          83.
          S3a auth exception to link to a wiki page on the problem Sub-task Open Unassigned  
          84.
          s3a to improve diags on s3a bad request message Sub-task Open Unassigned  
          85.
          AbstractContractDistCpTest to test attr preservation with -p, verify blobstores downgrade Sub-task Open Steve Loughran  
          86.
          s3a: auto-detect region for bucket and use right endpoint Sub-task Open Aaron Fabbri  
          87.
          ITestS3AContractRootDir failure on non-S3Guarded bucket Sub-task Open Unassigned  
          88.
          s3guard bucket-info command to include default bucket encryption info Sub-task Open Unassigned  
          89.
          S3 SSEC tests to downgrade when running against a mandatory encryption object store Sub-task Open Unassigned  
          90.
          S3AInputStream read(bytes[]) to not retry on read failure: pass action up Sub-task Open Unassigned  
          91.
          typo in TestNeworkBinding Sub-task Open Steve Loughran  
          92.
          ITestS3A select tests fail if user kinited in Sub-task Open Unassigned  
          93.
          S3A add histogram metrics types for latency, etc. Sub-task Open Sean Mackrory  
          94.
          cherry pick s3 ehancements from PrestoS3FileSystem Sub-task Open Unassigned  
          95.
          S3aDelegationTokens to add accessor for tests to get at the token binding Sub-task Open Unassigned  
          96.
          s3guard LimitExceededException -too many tables Sub-task Open Unassigned  
          97.
          Support multipart download in S3AFileSystem Sub-task Open Unassigned  
          98.
          NPE in S3AInputStream.read() in ITestS3AInconsistency.testOpenFailOnRead Sub-task Open Unassigned  
          99.
          S3A DT marshalling to include nested error text in wrapped message Sub-task Open Unassigned  
          100.
          S3 Select Exceptions are not being converted to IOEs Sub-task Open Unassigned  
          101.
          remove misleading fs.s3a.delegation.tokens.enabled prompt Sub-task Open Unassigned  
          102.
          S3A to support configuring various AWS S3 client extended options Sub-task Open Unassigned  
          103.
          Review S3A documentation to make sure it is consistent with the current codebase Sub-task Open Unassigned  
          104.
          S3A DT support to warn when loading expired token Sub-task Open Steve Loughran  
          105.
          ITestS3AAWSCredentialsProvider tests fail if a bucket has DTs enabled Sub-task Open Unassigned  
          106.
          Clarify committers.md around v2 failure handling Sub-task Open Unassigned  
          107.
          S3AFileStatus to add a serialVersionUID; review & test serialization Sub-task Open Unassigned  
          108.
          test YARN log collection works to s3a Sub-task Open Unassigned  
          109.
          Encrypt S3A data client-side with AWS SDK (S3-CSE) Sub-task Patch Available Igor Mazur  
          110.
          S3A can support short user-friendly aliases for configuration of credential providers. Sub-task Open Unassigned  
          111.
          Add some Java-8 friendly way to work with RemoteIterable, especially listings Sub-task Open Unassigned  
          112.
          Handle S3A "glacier" data Sub-task Open Unassigned  
          113.
          Add common getFileBlockLocations() emulation for object stores, including S3A Sub-task Patch Available Steve Loughran  
          114.
          S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes Sub-task Patch Available Kazuyuki Tanimura  
          115.
          s3a mkdirs() to not check dest for a dir marker Sub-task Open Unassigned  
          116.
          S3A to support Requester Pays Buckets Sub-task Patch Available Mandus Momberg

          0%

          Original Estimate - 2h
          Remaining Estimate - 2h
          117.
          clean up ITestS3AFileSystemContract Sub-task Patch Available Unassigned  
          118.
          ITestS3AContractGetFileStatusV1List may have consistency issues Sub-task Open Unassigned  
          119.
          S3ARetryPolicy to handle AWS 500 responses/error code TooBusyException with the throttle backoff policy Sub-task Open Unassigned  
          120.
          S3A to optionally retain directory markers; look under a marker for files when needEmptyDir=true Sub-task In Progress Steve Loughran  
          121.
          Add some Abortable.abort() interface for streams etc which can be terminated Sub-task Open Unassigned  
          122.
          S3A mkdirs to indicate which parent path element refers to a file Sub-task Open Unassigned  
          123.
          NPE in s3a byte buffer block upload Sub-task Open Unassigned  
          124.
          declare that fs.s3a.ext. is a prefix for arbitrary extensions Sub-task Open Unassigned  
          125.
          AWS AssumedRoleCredentialProvider needs ExternalId add Sub-task Open Unassigned  
          126.
          IAM role created by S3A DT doesn't include DynamoDB scan Sub-task Open Unassigned  
          127.
          S3AFilesystem trash handling should respect the current UGI Sub-task Open Unassigned  
          128.
          S3AInputStream.remainingInFile should use nextReadPos Sub-task Reopened lqjacklee  
          129.
          s3guard can't init table if caller doesn't have tag permissions Sub-task Open Unassigned  
          130.
          ITestS3AContractSeek teardown closes test FS before superclass can do its cleanup Sub-task Open Unassigned  
          131.
          Possible inconsistent state of AbstractDelegationTokenSecretManager Sub-task Patch Available Hankó Gergely  
          132.
          ITestCustomSigner uses absolute paths off the bucket root rather than fork-relative Sub-task Open Unassigned  
          133.
          log accepted/rejected fs.s3a.authoritative.path paths @ debug Sub-task Open Unassigned  
          134.
          Add option for a prefix to put in front of every s3guard table Sub-task Open Unassigned  
          135.
          Add more s3guard metrics Sub-task Open Unassigned  
          136.
          Improve DynamoDB schema update story Sub-task Open Sean Mackrory  
          137.
          S3Guard: Optimize performance of handling OOB operations in non-authoritative mode Sub-task Open Unassigned  
          138.
          getFileChecksum() needs to adopt S3Guard Sub-task In Progress lqjacklee  
          139.
          reduce/tune read failure fault injection on inconsistent client Sub-task Open Unassigned  
          140.
          increase performance of s3guard import command Sub-task Open Unassigned  
          141.
          intermittent failure of ITestCommitOperations: too many s3guard writes Sub-task Open Unassigned  
          142.
          S3a getFileStatus to update DDB if an S3 query returns etag/versionID Sub-task Open Unassigned  
          143.
          Possible for modified configuration to leak into metadatastore in S3GuardTool Sub-task Open Unassigned  
          144.
          S3Guard instrumentation to include cost of DynamoDB ops as metric Sub-task Open Unassigned  
          145.
          S3AFileSystem copyFile to propagate etag/version from getObjectMetadata to copy request Sub-task Open Unassigned  
          146.
          Intermittent failure of ITestS3GuardConcurrentOps#testConcurrentTableCreations Sub-task Open Unassigned  
          147.
          Intermittent failure of ITestS3GuardToolDynamoDB#testDynamoDBInitDestroyCycle Sub-task Open Unassigned  
          148.
          S3Guard init command uses global settings, not those of target bucket Sub-task Reopened Steve Loughran  
          149.
          Improve throttling on S3Guard DDB batch retries Sub-task Open Unassigned  
          150.
          S3guard: add inconsistency detection metrics Sub-task Open Unassigned  
          151.
          S3Guard prune to only remove auth dir marker if files (not tombstones) are removed Sub-task Open Unassigned  
          152.
          ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store Sub-task Open Unassigned  
          153.
          tag S3GuardTool entry points as limitedPrivate("management-tools")/evolving Sub-task Open Steve Loughran  
          154.
          Fix ITestS3GuardToolLocal#testInitNegativeRead test failure Sub-task Open Steve Loughran  
          155.
          Ensure controls in-place to prevent clients with significant clock skews pruning aggressively Sub-task Open Unassigned  
          156.
          Scheme assertion in S3Guard DynamoDBMetadataStore::checkPath is unnecessarily restrictive Sub-task Open Unassigned  
          157.
          improvements to S3GuardTool destroy command Sub-task Open Unassigned  
          158.
          Clock skew can cause S3Guard to think object metadata is out of date Sub-task Open Unassigned  
          159.
          S3guard metadata stores to support millions of entries Sub-task Open Unassigned  
          160.
          ITestS3GuardToolDynamoDB.testDynamoDBInitDestroyCycle fails if test bucket isn't on demand Sub-task Open Steve Loughran  
          161.
          S3Guard to self update on directory listings of S3 Sub-task Open Unassigned  
          162.
          S3guard mistakes root URI without / as non-absolute path Sub-task Open Unassigned  
          163.
          mkdir on s3a should not be sensitive to trailing '/' Sub-task Open Unassigned  
          164.
          ITestS3AConfiguration proxy tests fail when bucket probes == 0 Sub-task Open Mukund Thakur  
          165.
          Tune listStatus() api of s3a. Sub-task Open Mukund Thakur  
          166.
          S3A deleteObjects hanging/retrying forever Sub-task Open Unassigned  
          167.
          S3A staging committer committing duplicate files Sub-task Open Steve Loughran  
          168.
          S3A delegation token binding to support secondary binding list Sub-task In Progress Steve Loughran  
          169.
          Optimise s3a Listing to be fully asynchronous. Sub-task Open Mukund Thakur  
          170.
          whitespace not allowed in paths when saving files to s3a via committer Sub-task Open Unassigned  
          171.
          Update listing to use OperationCallback for calling s3a apis. Sub-task Open Mukund Thakur  
          172.
          S3AFileSystem.listLocatedStatu(file) does a LIST even with S3Guard Sub-task Open Unassigned  

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified