Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829

Über-jira: S3A Hadoop 3.3.1 features

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      Über-jira: S3A features/fixes for Hadoop 3.4

      As usual, this will clutter up with everything which hasn't gone in: don't interpret presence on this list as a commitment to implement.

      And for anyone wanting to add patches

      MUST

      1. reviews via github PRs
      2. no declaration of AWS S3 endpoint (or other S3 impl) -no review

      SHOULD

      1. have a setup for testing SSE-KMS, DDB/S3Guard
      2. including an assumed role we can use for AssumedRole Delegation Tokens

      If you are going near those bits of code, they uprate from SHOULD to MUST.

        Attachments

          Issue Links

          1.
          make sure staging committers collect DTs for the staging FS Sub-task Resolved Unassigned  
          2.
          s3a test can hang in teardown with network problems Sub-task Resolved Unassigned  
          3.
          distcp -update to S3A; abfs, etc always overwrites due to block size mismatch Sub-task Resolved Steve Loughran  
          4.
          S3AFileSystem.getContentSummary() to use listFiles(recursive) Sub-task Resolved Unassigned  
          5.
          add extra S3A MPU test to see what happens if a file is created during the MPU Sub-task Resolved Steve Loughran  
          6.
          support git-secrets commit hook to keep AWS secrets out of git Sub-task Patch Available Steve Loughran  
          7.
          S3A Secret access to fall back to XML if credential provider raises IOE. Sub-task Open Unassigned  
          8.
          S3A doesn't actually verify paths have the correct authority Sub-task Resolved Steve Loughran  
          9.
          Use error code detail in AWS server responses for finer grained exceptions Sub-task Resolved Unassigned  
          10.
          Test MR split optimisation with recursive listing Sub-task Open Unassigned  
          11.
          initial part uploads seem to block unnecessarily in S3ABlockOutputStream Sub-task Resolved Steven Rand  
          12.
          Speed up S3A test runs Sub-task Open Unassigned  
          13.
          S3AFilesystem.initiateRename() can skip check on dest.parent status if src has same parent Sub-task Resolved Steve Loughran  
          14.
          S3A copy/rename of large files to be parallelized as a multipart operation Sub-task Open Unassigned  
          15.
          Test hadoop fs shell against s3a; fix problems Sub-task Open Unassigned  
          16.
          s3guard bucket-info command to add a verify-property <key>=<value> <bucket> Sub-task Open Unassigned  
          17.
          S3A to implement rename(final Path src, final Path dst, final Rename... options) Sub-task Open Unassigned  
          18.
          Collect AwsSdkMetrics in S3A FileSystem IOStatistics Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 10m
          19.
          Add some tests about S3 timestamp tracking Sub-task Resolved Unassigned  
          20.
          S3a operations keep retrying if the password is wrong Sub-task Open Thomas Poepping  
          21.
          Add some S3A-specific create file options Sub-task Open Unassigned  
          22.
          s3a to set fake directory marker contentType to application/x-directory Sub-task Open Steve Loughran  
          23.
          S3A Filesystem does not check return from AmazonS3Client deleteObjects Sub-task Open Unassigned  
          24.
          Optimize uses of FS operations in the ASF analysis frameworks and libraries Sub-task Resolved Steve Loughran  
          25.
          shell rm command to not rename to ~/.Trash in object stores Sub-task Open Unassigned  
          26.
          Add S3A support for Async Scatter/Gather IO Sub-task Open Gabor Bota  
          27.
          Impersonate hosts in s3a for better data locality handling Sub-task Open Thomas Demoor  
          28.
          log DNS addresses on s3a init Sub-task Resolved Mukund Thakur  
          29.
          export s3a BlockingThreadPoolExecutorService pool info (size, load) as gauges Sub-task Resolved Unassigned  
          30.
          increase the default number of threads and http connections in S3A Sub-task Open Unassigned  
          31.
          S3A authenticators to log origin of .secret.key options Sub-task Resolved Unassigned  
          32.
          Support AWS S3 reduced redundancy storage class Sub-task Open Unassigned  
          33.
          s3a doesn't consider blobs with trailing / and content-length >0 as directories Sub-task Open Unassigned  
          34.
          make s3a read fault injection configurable including "off" Sub-task Open Unassigned  
          35.
          Retrive modtime of PUT file from store, via response or HEAD Sub-task Resolved Unassigned  
          36.
          s3a create() doesn't check for an ancestor path being a file Sub-task Resolved Sean Mackrory  
          37.
          strip s3.amazonaws.com off hostnames before making s3a calls Sub-task Open Unassigned  
          38.
          add a special 0 byte input stream for empty blobs Sub-task Open Unassigned  
          39.
          improve setting of max connections in AWS client Sub-task Open Unassigned  
          40.
          multipart/huge file upload tests to look at checksums returned Sub-task Open Unassigned  
          41.
          Some S3A tests leak filesystem instances Sub-task Open Unassigned  
          42.
          s3guard to provide better diags on ddb init failures Sub-task Open Unassigned  
          43.
          Improve isolation of FS instances in S3A committer tests Sub-task Open Unassigned  
          44.
          s3a rm on the CLI generates deprecation warning on io.bytes.per.checksum Sub-task Open Unassigned  
          45.
          S3A getContentSummary() to move to listFiles(recursive) to count children; instrument use Sub-task Open Unassigned  
          46.
          review S3A translateException translation matches IBM CORS spec Sub-task Open Unassigned  
          47.
          Improve S3A rename resilience Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h
          48.
          add tests/docs for HAR files on s3a Sub-task Resolved Unassigned  
          49.
          S3A to use a thread pool for async path operations Sub-task Open Unassigned  
          50.
          builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate upload Sub-task Open Unassigned  
          51.
          FileSystem/s3a processDeleteOnExit to skip the exists() check Sub-task Open Unassigned  
          52.
          Add custom InstanceProfileCredentialsProvider with more resilience to throttling Sub-task Open Unassigned  
          53.
          ITestS3AMiniYarnCluster fails on sequential runs with Kerberos error Sub-task Open Unassigned  
          54.
          S3a DelegationToken bindings to to support a "correlation ID" for the UA header Sub-task Open Unassigned  
          55.
          add experimental optimization of s3a directory marker handling Sub-task Resolved Unassigned  
          56.
          Report on S3A cached 404 recovery better Sub-task Resolved Unassigned  
          57.
          Stabilize openFile() and adopt internally Sub-task In Progress Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 10h
          58.
          S3AInputStream.skip() to use lazy seek Sub-task Open Unassigned  
          59.
          s3a to instrument duration of HTTP calls Sub-task Resolved Steve Loughran  
          60.
          Document `dynamodb:TagResource` an explicit client-side permission for S3Guard Sub-task Open Gabor Bota  
          61.
          catch and downgrade all exceptions trying to load openssl native libs through wildfly Sub-task Resolved Steve Loughran  
          62.
          ITestS3ARemoteFileChanged doesn't overwrite test data creation Sub-task Open Unassigned  
          63.
          Optimize getFileStatus in S3A Sub-task Resolved Steven K. Wong  
          64.
          s3a mkdir path/ can add 404 to S3 load balancers Sub-task Resolved Unassigned  
          65.
          AWS Data read stack trace in S3a putObjectDirect Sub-task Open Unassigned  
          66.
          AmazonClient 30x exceptions to include redirect URL in message Sub-task Resolved Unassigned  
          67.
          S3A FS to add "s3a:no-existence-checks" to the builder file creation option set Sub-task Open Unassigned  
          68.
          s3a new getdefaultblocksize be called in getFileStatus which has not been implemented in s3afilesystem yet Sub-task Open Unassigned  
          69.
          Use lighter-weight alternatives to innerGetFileStatus where possible Sub-task Open Unassigned  
          70.
          S3A Input Stream bytes read counter isn't getting through to StorageStatistics/instrumentation properly Sub-task Resolved Unassigned  
          71.
          job commit failure in S3A MR magic committer test Sub-task Resolved Steve Loughran  
          72.
          intermittent failure of ITestS3GuardListConsistency.testInconsistentS3ClientDeletes in parallel runs Sub-task Resolved Unassigned  
          73.
          Add S3AWriteOpContext for write ops; pass in statistics and other settings Sub-task Open Unassigned  
          74.
          S3A: Set thread names with more specific information about the call. Sub-task Open Unassigned  
          75.
          S3A Client to add explicit support for versioned stores Sub-task Resolved Steve Loughran  
          76.
          Add a way for an FS instance to say "really, no trash interval at all" Sub-task Open Unassigned  
          77.
          s3guard uploads command to list date and initiator of outstanding uploads Sub-task Open Unassigned  
          78.
          test and document use of fs.s3a.signing-algorithm Sub-task Open Unassigned  
          79.
          S3A statistic collection underrecords bytes written in helper threads Sub-task Resolved Steve Loughran  
          80.
          S3AInputStream.seek should throw EOFException if seeking past the end of file Sub-task Open Unassigned  
          81.
          S3a auth exception to link to a wiki page on the problem Sub-task Open Unassigned  
          82.
          s3a to improve diags on s3a bad request message Sub-task Open Unassigned  
          83.
          AbstractContractDistCpTest to test attr preservation with -p, verify blobstores downgrade Sub-task Open Steve Loughran  
          84.
          s3a: auto-detect region for bucket and use right endpoint Sub-task Resolved Aaron Fabbri  
          85.
          ITestS3AContractRootDir failure on non-S3Guarded bucket Sub-task Open Unassigned  
          86.
          s3guard bucket-info command to include default bucket encryption info Sub-task Open Unassigned  
          87.
          S3 SSEC tests to downgrade when running against a mandatory encryption object store Sub-task Open Unassigned  
          88.
          S3AInputStream read(bytes[]) to not retry on read failure: pass action up Sub-task Open Unassigned  
          89.
          ITestS3A select tests fail if user kinited in Sub-task Open Unassigned  
          90.
          S3A add histogram metrics types for latency, etc. Sub-task Open Sean Mackrory  
          91.
          cherry pick s3 ehancements from PrestoS3FileSystem Sub-task Open Unassigned  
          92.
          S3aDelegationTokens to add accessor for tests to get at the token binding Sub-task Open Unassigned  
          93.
          s3guard LimitExceededException -too many tables Sub-task Resolved Unassigned  
          94.
          Support multipart download in S3AFileSystem Sub-task Open Unassigned  
          95.
          NPE in S3AInputStream.read() in ITestS3AInconsistency.testOpenFailOnRead Sub-task Open Unassigned  
          96.
          S3 Select Exceptions are not being converted to IOEs Sub-task Open Unassigned  
          97.
          s3a directory housekeeping operations to be done in async thread Sub-task Resolved Unassigned  
          98.
          remove misleading fs.s3a.delegation.tokens.enabled prompt Sub-task Open Unassigned  
          99.
          ITestS3ARemoteFileChanged tests fail if you set the bucket to versionid tracking Sub-task Resolved Unassigned  
          100.
          Review S3A documentation to make sure it is consistent with the current codebase Sub-task Open Unassigned  
          101.
          Clarify committers.md around v2 failure handling Sub-task Open Unassigned  
          102.
          test YARN log collection works to s3a Sub-task Open Unassigned  
          103.
          Encrypt S3A data client-side with AWS SDK (S3-CSE) Sub-task Patch Available Igor Mazur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 50m
          104.
          Add some Java-8 friendly way to work with RemoteIterable, especially listings Sub-task Resolved Unassigned  
          105.
          Handle S3A "glacier" data Sub-task Open Unassigned  
          106.
          Add common getFileBlockLocations() emulation for object stores, including S3A Sub-task Patch Available Steve Loughran  
          107.
          S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes Sub-task Resolved Kazuyuki Tanimura  
          108.
          s3a mkdirs() to not check dest for a dir marker Sub-task Resolved Unassigned  
          109.
          clean up ITestS3AFileSystemContract Sub-task Patch Available Unassigned  
          110.
          S3A FS deleteOnExit to skip the exists check Sub-task Resolved Unassigned  
          111.
          S3A to optionally retain directory markers Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          112.
          Add some Abortable.abort() interface for streams etc which can be terminated Sub-task Resolved Jungtaek Lim

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h
          113.
          declare that fs.s3a.ext. is a prefix for arbitrary extensions Sub-task Open Unassigned  
          114.
          IAM role created by S3A DT doesn't include DynamoDB scan Sub-task Resolved Unassigned  
          115.
          S3A FullCredentialsTokenBinding fails if local credentials are unset Sub-task Resolved Steve Loughran  
          116.
          S3AFilesystem trash handling should respect the current UGI Sub-task Open Unassigned  
          117.
          S3AInputStream.remainingInFile should use nextReadPos Sub-task Reopened lqjacklee  
          118.
          s3guard can't init table if caller doesn't have tag permissions Sub-task Resolved Unassigned  
          119.
          Possible inconsistent state of AbstractDelegationTokenSecretManager Sub-task Patch Available Hankó Gergely

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 10m
          120.
          ITestS3GuardOutOfBandOperations.testListingDelete failing on versioned bucket Sub-task Resolved Steve Loughran  
          121.
          Add option for a prefix to put in front of every s3guard table Sub-task Resolved Unassigned  
          122.
          Add more s3guard metrics Sub-task Resolved Unassigned  
          123.
          Improve DynamoDB schema update story Sub-task Resolved Sean Mackrory  
          124.
          S3Guard: Optimize performance of handling OOB operations in non-authoritative mode Sub-task Resolved Unassigned  
          125.
          getFileChecksum() needs to adopt S3Guard Sub-task Resolved lqjacklee  
          126.
          reduce/tune read failure fault injection on inconsistent client Sub-task Resolved Unassigned  
          127.
          increase performance of s3guard import command Sub-task Open Unassigned  
          128.
          intermittent failure of ITestCommitOperations: too many s3guard writes Sub-task Resolved Unassigned  
          129.
          S3a getFileStatus to update DDB if an S3 query returns etag/versionID Sub-task Resolved Unassigned  
          130.
          Possible for modified configuration to leak into metadatastore in S3GuardTool Sub-task Resolved Unassigned  
          131.
          S3Guard instrumentation to include cost of DynamoDB ops as metric Sub-task Resolved Unassigned  
          132.
          Intermittent failure of ITestS3GuardConcurrentOps#testConcurrentTableCreations Sub-task Resolved Unassigned  
          133.
          Intermittent failure of ITestS3GuardToolDynamoDB#testDynamoDBInitDestroyCycle Sub-task Resolved Unassigned  
          134.
          S3Guard init command uses global settings, not those of target bucket Sub-task Resolved Steve Loughran  
          135.
          Improve throttling on S3Guard DDB batch retries Sub-task Resolved Unassigned  
          136.
          S3guard: add inconsistency detection metrics Sub-task Resolved Unassigned  
          137.
          S3Guard prune to only remove auth dir marker if files (not tombstones) are removed Sub-task Resolved Unassigned  
          138.
          ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store Sub-task Resolved Unassigned  
          139.
          tag S3GuardTool entry points as limitedPrivate("management-tools")/evolving Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          140.
          Fix ITestS3GuardToolLocal#testInitNegativeRead test failure Sub-task Resolved Steve Loughran  
          141.
          Ensure controls in-place to prevent clients with significant clock skews pruning aggressively Sub-task Resolved Unassigned  
          142.
          Scheme assertion in S3Guard DynamoDBMetadataStore::checkPath is unnecessarily restrictive Sub-task Resolved Unassigned  
          143.
          improvements to S3GuardTool destroy command Sub-task Resolved Unassigned  
          144.
          Clock skew can cause S3Guard to think object metadata is out of date Sub-task Resolved Unassigned  
          145.
          S3guard metadata stores to support millions of entries Sub-task Resolved Unassigned  
          146.
          ITestS3GuardToolDynamoDB.testDynamoDBInitDestroyCycle fails if test bucket isn't on demand Sub-task Resolved Steve Loughran  
          147.
          S3Guard to self update on directory listings of S3 Sub-task Resolved Unassigned  
          148.
          S3guard mistakes root URI without / as non-absolute path Sub-task Resolved Unassigned  
          149.
          HADOOP-16953. tune s3guard disabled warnings Sub-task Resolved Steve Loughran  
          150.
          mkdir on s3a should not be sensitive to trailing '/' Sub-task Resolved Steve Loughran  
          151.
          s3a to not need wildfly on the classpath Sub-task Resolved Steve Loughran  
          152.
          ITestS3AConfiguration proxy tests fail when bucket probes == 0 Sub-task Resolved Gabor Bota  
          153.
          S3A to support additional token issuers Sub-task Resolved Steve Loughran  
          154.
          S3A staging committer committing duplicate files Sub-task Resolved Steve Loughran  
          155.
          ITestS3GuardOutOfBandOperations testListingDelete[auth=false] fails on unversioned bucket Sub-task Resolved Unassigned  
          156.
          S3AFileSystem silently deletes "fake" directories when writing a file. Sub-task Resolved Unassigned  
          157.
          ITestS3AEncryptionWithDefaultS3Settings fails if default bucket encryption != KMS Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          158.
          Handle transient stream read failures in FileSystem contract tests Sub-task Resolved Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 10m
          159.
          add way for s3a to recognise buckets with "." in name and switch to path access Sub-task Resolved Unassigned  
          160.
          Decrease size of s3a dependencies Sub-task Resolved Unassigned  
          161.
          Backport HADOOP-13230 list/getFileStatus changes for preserved directory markers Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 50m
          162.
          Renaming a file under a sibling empty directory doesn't delete dest dir's marker Sub-task Resolved Steve Loughran  
          163.
          improve s3guard markers command line tool Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 10m
          164.
          Backport HADOOP-13230 listing changes for preserved directory markers to 3.1.x Sub-task Resolved Steve Loughran  
          165.
          HADOOP-17244. S3A directory delete tombstones dir markers prematurely. Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 40m
          166.
          s3a rename() now requires s3:deleteObjectVersion permission Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 20m
          167.
          S3A to always probe S3 in S3A getFileStatus on non-auth paths Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 40m
          168.
          ITestCustomSigner fails with gcs s3 compatible endpoint. Sub-task Resolved Mukund Thakur

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          169.
          S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions Sub-task Resolved Yongjun Zhang  
          170.
          FileSystem.get to support slow-to-instantiate FS clients Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 50m
          171.
          S3A committer to support concurrent jobs with same app attempt ID & dest dir Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 8h 10m
          172.
          S3A marker tool mixes up -min and -max Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          173.
          hadoop-cloud-storage transient dependencies need review Sub-task Open Unassigned  
          174.
          ITestS3AContractRename failing against stricter tests Sub-task Resolved Attila Doroszlai

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          175.
          AbstractS3ATokenIdentifier to set issue date == now Sub-task Resolved Jungtaek Lim

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          176.
          Upgrade aws-java-sdk to 1.11.901 Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h
          177.
          ITestS3ADeleteCost.testDirMarkersFileCreation failure Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          178.
          AbstractS3ATokenIdentifier to issue date in UTC Sub-task Resolved Jungtaek Lim

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          179.
          enable s3a magic committer by default Sub-task Resolved Steve Loughran  
          180.
          Magic committer files don't have the count of bytes written collected by spark Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 12h 20m
          181.
          Intermittent S3AInputStream failures: Premature end of Content-Length delimited message body etc Sub-task Resolved Yongjun Zhang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 20m
          182.
          S3A docs to state s3 is consistent, deprecate S3Guard Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          183.
          magic committer to be enabled for all S3 buckets Sub-task Resolved Steve Loughran

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3.5h
          184.
          typo in MagicCommitTracker Sub-task Resolved Pierrick HYMBERT

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 97h 20m
                  97h 20m