Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15620

Über-jira: S3A phase VI: Hadoop 3.3 features

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Attachments

        Issue Links

        1.
        S3A to support configuring various AWS S3 client extended options Sub-task Open Unassigned  
        2.
        s3guard bucket-info command to add a verify-property <key>=<value> <bucket> Sub-task Open Unassigned  
        3.
        s3a: auto-detect region for bucket and use right endpoint Sub-task Open Aaron Fabbri  
        4.
        s3guard uploads command to list date and initiator of outstanding uploads Sub-task Open Unassigned  
        5.
        s3a create(overwrite=true) to only look for dir/ and list entries, not file Sub-task Resolved Steve Loughran  
        6.
        export s3a BlockingThreadPoolExecutorService pool info (size, load) as metrics Sub-task Open Unassigned  
        7.
        S3A init hangs if you try to connect while the system is offline Sub-task Resolved Unassigned  
        8.
        AmazonClient 30x exceptions to include redirect URL in message Sub-task Open Unassigned  
        9.
        S3AInputStream to implement CanUnbuffer Sub-task Resolved Sahil Takiar  
        10.
        S3A input stream to support ByteBufferReadable Sub-task Open Unassigned  
        11.
        hook up AwsSdkMetrics to hadoop metrics Sub-task Open Sean Mackrory  
        12.
        add tests/docs for HAR files on s3a Sub-task Open Unassigned  
        13.
        Impersonate hosts in s3a for better data locality handling Sub-task Open Thomas Demoor  
        14.
        cherry pick s3 ehancements from PrestoS3FileSystem Sub-task Open Unassigned  
        15.
        S3A to use a thread pool for async path operations Sub-task Open Unassigned  
        16.
        s3a to instrument duration of HTTP calls Sub-task Open Unassigned  
        17.
        S3A: Consider using TransferManager.download for copyToLocalFile Sub-task Resolved Unassigned  
        18.
        Parallelize S3A directory deletes Sub-task Open Unassigned  
        19.
        test YARN log collection works to s3a Sub-task Open Unassigned  
        20.
        Add support for S3 Select to S3A Sub-task Resolved Steve Loughran  
        21.
        S3A to implement rename(final Path src, final Path dst, final Rename... options) Sub-task Open Unassigned  
        22.
        FileSystem/s3a processDeleteOnExit to skip the exists() check Sub-task Open Unassigned  
        23.
        S3A to support Delegation Tokens Sub-task Resolved Steve Loughran  
        24.
        Encrypt S3A data client-side with AWS SDK Sub-task Patch Available Igor Mazur  
        25.
        Stabilise/formalise the JSON _SUCCESS format used in the S3A committers Sub-task Resolved Unassigned  
        26.
        s3guard bucket-info command to include default bucket encryption info Sub-task Open Unassigned  
        27.
        Handle S3A "glacier" data Sub-task Open Unassigned  
        28.
        Optimize uses of FS operations in the ASF analysis frameworks and libraries Sub-task Open Steve Loughran  
        29.
        Use error code detail in AWS server responses for finer grained exceptions Sub-task Open Unassigned  
        30.
        Add S3A implementation of FSMainOperationsBaseTest Sub-task Resolved Steve Loughran  
        31.
        Support AWS S3 reduced redundancy storage class Sub-task Open Unassigned  
        32.
        S3a to support get/set permissions through S3 object tags Sub-task Resolved Unassigned  
        33.
        S3A add histogram metrics types for latency, etc. Sub-task Open Sean Mackrory  
        34.
        builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate upload Sub-task Open Unassigned  
        35.
        log DNS addresses on s3a init Sub-task Open Unassigned  
        36.
        S3a rename() to copy files in a directory in parallel Sub-task Resolved Unassigned  
        37.
        Optimize getFileStatus in S3A Sub-task Open Steven K. Wong  
        38.
        Add HTrace to the s3a connector Sub-task Resolved Madhawa Kasun Gunasekara  
        39.
        S3A getContentSummary() to move to listFiles(recursive) to count children; instrument use Sub-task Open Unassigned  
        40.
        S3a auth exception to link to a wiki page on the problem Sub-task Open Unassigned  
        41.
        add a special 0 byte input stream for empty blobs Sub-task Open Unassigned  
        42.
        S3A: Set thread names with more specific information about the call. Sub-task Open Unassigned  
        43.
        S3A can support short user-friendly aliases for configuration of credential providers. Sub-task Open Unassigned  
        44.
        shell rm command to not rename to ~/.Trash in object stores Sub-task Open Unassigned  
        45.
        s3a directory housekeeping operations to be done in async thread Sub-task Open Unassigned  
        46.
        Filesystem discovery to stop loading implementation classes Sub-task Open Steve Loughran  
        47.
        S3A should allow renaming to a pre-existing destination directory to move the source path under that directory, similar to HDFS. Sub-task Resolved Unassigned  
        48.
        S3A authenticators to log origin of .secret.key options Sub-task Open Unassigned  
        49.
        fs -expunge to take a filesystem Sub-task Resolved Shweta  
        50.
        s3a create() doesn't check for an ancestor path being a file Sub-task Open Sean Mackrory  
        51.
        s3a test can hang in teardown with network problems Sub-task Open Unassigned  
        52.
        Test hadoop fs shell against s3a; fix problems Sub-task Open Unassigned  
        53.
        S3a: Failed to reset the request input stream/make S3A uploadPart() retriable Sub-task Open Unassigned  
        54.
        S3AFileSystem silently deletes "fake" directories when writing a file. Sub-task Resolved Unassigned  
        55.
        review S3A translateException translation matches IBM CORS spec Sub-task Open Unassigned  
        56.
        S3AInputStream.skip() to use lazy seek Sub-task Open Unassigned  
        57.
        S3ARetryPolicy to handle AWS 500 responses/error code TooBusyException with the throttle backoff policy Sub-task Open Unassigned  
        58.
        S3A Input Stream bytes read counter isn't getting through to StorageStatistics/instrumentation properly Sub-task Open Unassigned  
        59.
        Test MR split optimisation with recursive listing Sub-task Open Unassigned  
        60.
        S3 SSEC tests to downgrade when running against a mandatory encryption object store Sub-task Open Unassigned  
        61.
        S3A Retry policy to retry on NoResponseException Sub-task Resolved Steve Loughran  
        62.
        s3a doesn't consider blobs with trailing / and content-length >0 as directories Sub-task Open Unassigned  
        63.
        S3a operations keep retrying if the password is wrong Sub-task Open Thomas Poepping  
        64.
        s3a new getdefaultblocksize be called in getFileStatus which has not been implemented in s3afilesystem yet Sub-task Open Unassigned  
        65.
        S3A client raising ConnectionPoolTimeoutException in test Sub-task Resolved Unassigned  
        66.
        s3guard to provide better diags on ddb init failures Sub-task Open Unassigned  
        67.
        Bulk commits of S3A MPUs place needless excessive load on S3 & S3Guard Sub-task Resolved Steve Loughran  
        68.
        improve setting of max connections in AWS client Sub-task Open Unassigned  
        69.
        Add common getFileBlockLocations() emulation for object stores, including S3A Sub-task Patch Available Steve Loughran  
        70.
        Add custom InstanceProfileCredentialsProvider with more resilience to throttling Sub-task Open Unassigned  
        71.
        S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes Sub-task Patch Available Kazuyuki Tanimura  
        72.
        S3A log message on rm s3a://bucket/ not intuitive Sub-task Resolved Gabor Bota  
        73.
        S3A FS to add "s3a:no-existence-checks" to the builder file creation option set Sub-task Open Unassigned  
        74.
        clean up ITestS3AFileSystemContract Sub-task Patch Available Unassigned  
        75.
        make s3a read fault injection configurable including "off" Sub-task Open Unassigned  
        76.
        s3a rm on the CLI generates deprecation warning on io.bytes.per.checksum Sub-task Open Unassigned  
        77.
        S3aUtils.getEncryptionAlgorithm() always logs@Debug "Using SSE-C" Sub-task Resolved Unassigned  
        78.
        Add a way for an FS instance to say "really, no trash interval at all" Sub-task Open Unassigned  
        79.
        support git-secrets commit hook to keep AWS secrets out of git Sub-task Patch Available Steve Loughran  
        80.
        initial part uploads seem to block unnecessarily in S3ABlockOutputStream Sub-task Open Steven Rand  
        81.
        strip s3.amazonaws.com off hostnames before making s3a calls Sub-task Open Unassigned  
        82.
        test and document use of fs.s3a.signing-algorithm Sub-task Open Unassigned  
        83.
        S3AFileStatus to add a serialVersionUID; review & test serialization Sub-task Open Unassigned  
        84.
        S3A to support Requester Pays Buckets Sub-task Patch Available Mandus Momberg

        0%

        Original Estimate - 2h
        Remaining Estimate - 2h
        85.
        S3A warning of obsolete encryption key which is never used Sub-task Resolved Unassigned  
        86.
        AbstractContractDistCpTest to test attr preservation with -p, verify blobstores downgrade Sub-task Open Steve Loughran  
        87.
        add s3guard CLI command to generate session keys for an assumed role Sub-task Resolved Steve Loughran  
        88.
        S3AFileSystem.verifyBucketExists to move to s3.doesBucketExistV2 Sub-task Resolved lqjacklee  
        89.
        NPE in S3AInputStream.read() in ITestS3AInconsistency.testOpenFailOnRead Sub-task Open Unassigned  
        90.
        FileSystemMultipartUploader should verify that UploadHandle has non-0 length Sub-task Resolved Ewan Higgs  
        91.
        Memory leak in S3AOutputStream Sub-task Resolved Steve Loughran  
        92.
        intermittent failure of ITestS3GuardListConsistency.testInconsistentS3ClientDeletes in parallel runs Sub-task Open Unassigned  
        93.
        [s3a] stop treat fs.s3a.max.threads as the long-term minimum Sub-task Resolved Sean Mackrory  
        94.
        S3 listing inconsistency can raise NPE in globber Sub-task Resolved Steve Loughran  
        95.
        remove obsolete S3A test ITestS3ACredentialsInURL Sub-task Resolved Steve Loughran  
        96.
        AWS Data read stack trace in S3a putObjectDirect Sub-task Open Unassigned  
        97.
        S3A input stream to use etags/version number to detect changed source files Sub-task Resolved Ben Roling  
        98.
        S3A Filesystem does not check return from AmazonS3Client deleteObjects Sub-task Open Unassigned  
        99.
        Move ITestS3AMiniYarnCluster to S3A committers Sub-task Resolved Steve Loughran  
        100.
        ITestS3AContractRootDir failure on non-S3Guarded bucket Sub-task Open Unassigned  
        101.
        @Retries annotation of putObject() call & uses wrong Sub-task Resolved Steve Loughran  
        102.
        Review + update cloud store sensitive keys in hadoop.security.sensitive-config-keys Sub-task Resolved Steve Loughran  
        103.
        Remove transient dependency on hadoop-hdfs-client Sub-task Open Unassigned  
        104.
        ITestS3AContractMultipartUploader#testMultipartUploadEmptyPart test error Sub-task Resolved Ewan Higgs  
        105.
        S3AInputStream.remainingInFile should use nextReadPos Sub-task Reopened lqjacklee  
        106.
        S3AInputStream.seek should throw EOFException if seeking past the end of file Sub-task Open Unassigned  
        107.
        Some S3A committer tests don't match ITest* pattern; don't run in maven Sub-task Resolved Steve Loughran  
        108.
        Report problems w/ local S3A buffer directory meaningfully Sub-task Open Unassigned  
        109.
        get patch for S3a nextReadPos(), through Yetus Sub-task Resolved lqjacklee  
        110.
        Hadoop aws does not use shaded jars Sub-task Resolved Unassigned  
        111.
        Oozie unable to create sharelib in s3a filesystem Sub-task Resolved Steve Loughran  
        112.
        S3AInputStream logging to make it easier to debug file leakage Sub-task Open Unassigned  
        113.
        declare that fs.s3a.ext. is a prefix for arbitrary extensions Sub-task Open Unassigned  
        114.
        S3A committers: make sure there's regular progress() calls Sub-task Patch Available lqjacklee  
        115.
        Add S3A support for Async Scatter/Gather IO Sub-task Open Unassigned  
        116.
        S3AFileSystem.verifyBucketExists to move to s3.doesBucketExistV2 Sub-task Patch Available lqjacklee  
        117.
        Add bouncycastle jars to hadoop-aws as test dependencies Sub-task Resolved Steve Loughran  
        118.
        Add some S3A-specific create file options Sub-task Open Unassigned  
        119.
        [DOC] Effective use of FS instances during S3A integration tests Sub-task Resolved Gabor Bota  
        120.
        hamcrest-library declaration in hadoop-aws to be scoped test Sub-task Resolved Steve Loughran  
        121.
        s3a SSL connections should use OpenSSL Sub-task Resolved Sahil Takiar  
        122.
        S3A tests to include Terasort Sub-task Resolved Steve Loughran  
        123.
        S3a DelegationToken bindings to to support a "correlation ID" for the UA header Sub-task Open Unassigned  
        124.
        Token.toString faulting if any token listed can't load. Sub-task Resolved Steve Loughran  
        125.
        S3A Client to add explicit support for versioned stores Sub-task Patch Available Steve Loughran  
        126.
        Move DurationInfo from hadoop-aws to hadoop-common org.apache.hadoop.util Sub-task Resolved Abhishek Modi  
        127.
        Parquet reading S3AFileSystem causes EOF Sub-task Resolved Steve Loughran  
        128.
        Update AWS SDK to 1.11.563 Sub-task Resolved Steve Loughran

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
        129.
        Some S3A tests leak filesystem instances Sub-task Open Unassigned  
        130.
        Support multipart download in S3AFileSystem Sub-task Open Unassigned  
        131.
        ITestS3A select tests fail if user kinited in Sub-task Open Unassigned  
        132.
        S3AInputStream read(bytes[]) to not retry on read failure: pass action up Sub-task Open Unassigned  
        133.
        Extend documentation in testing.md about endpoint constants Sub-task Resolved Adam Antal  
        134.
        S3aDelegationTokens to add accessor for tests to get at the token binding Sub-task Open Unassigned  
        135.
        Add some tests about S3 timestamp tracking Sub-task Open Unassigned  
        136.
        regression: ITestS3AMiniYarnCluster failing on branch-3.2 Sub-task Resolved Unassigned  
        137.
        S3Guard to add DynamoDBLocal Support Sub-task Resolved lqjacklee  
        138.
        s3a rename failed during copy, "Unable to copy part" + 200 error code Sub-task Open Unassigned  
        139.
        S3A copy/rename of large files to be parallelized as a multipart operation Sub-task Open Unassigned  
        140.
        S3A copyFile operation to include source versionID or etag in the copy request Sub-task Resolved Steve Loughran  
        141.
        add extra S3A MPU test to see what happens if a file is created during the MPU Sub-task Reopened Steve Loughran  
        142.
        S3A MarshalledCredentials.toString() doesn't print full date/time of expiry Sub-task Resolved Steve Loughran  
        143.
        S3AUtils.translateException to map CredentialInitializationException to AccessDeniedException Sub-task Resolved Steve Loughran  
        144.
        S3A openFile() operation to support explicit versionID, etag parameters Sub-task Open Unassigned  
        145.
        S3AFileSystem#innerMkdirs builds needless lists Sub-task Resolved Lokesh Jain  
        146.
        ITestS3AContractGetFileStatusV1List may have consistency issues Sub-task Open Unassigned  
        147.
        warning about user:pass in URI to explicitly call out Hadoop 3.2 as removal Sub-task Resolved Steve Loughran  
        148.
        Improved S3A MR tests Sub-task Resolved Steve Loughran  
        149.
        remove misleading fs.s3a.delegation.tokens.enabled prompt Sub-task Open Unassigned  
        150.
        S3AFileStatus to declare that isEncrypted() is always true Sub-task Resolved Steve Loughran  
        151.
        Clarify committers.md around v2 failure handling Sub-task Open Unassigned  
        152.
        Add S3AWriteOpContext for write ops; pass in statistics and other settings Sub-task Open Unassigned  
        153.
        S3A statistic collection underrecords bytes written in helper threads Sub-task Open Unassigned  
        154.
        ITestS3AContractSeek teardown closes test FS before superclass can do its cleanup Sub-task Open Unassigned  
        155.
        multipart/huge file upload tests to look at checksums returned Sub-task Open Unassigned  
        156.
        S3A delegation tests fail if you set fs.s3a.secret.key Sub-task Resolved Unassigned  
        157.
        ITestS3AMiscOperations.testEmptyFileChecksums and ITestS3AMiscOperations.testNonEmptyFileChecksumsUnencrypted fail with default encryption enabled on bucket Sub-task Open Unassigned  
        158.
        S3A Delegation Token code to spell "Marshalled" as Marshaled Sub-task Resolved Steve Loughran  
        159.
        s3a test docs to mention non-auth; or s3a tests to default to non-auth Sub-task Open Unassigned  
        160.
        ClassCastException in S3GuardTool.checkMetadataStoreUri Sub-task Resolved Steve Loughran  
        161.
        Regression: TestStagingPartitionedJobCommit failing with empty etag list Sub-task Resolved Steve Loughran  
        162.
        Remove S3A's depedency on http core Sub-task Resolved Steve Loughran  
        163.
        Test Hang in S3A S3guard test MetadataStoreTestBase.testListChildren Sub-task Resolved Unassigned  
        164.
        Stabilize S3A OpenSSL support Sub-task Open Unassigned  
        165.
        MapReduce job tasks fails on S3A ssl3_get_server_certificate:certificate verify Sub-task Resolved Steve Loughran  
        166.
        TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists Sub-task Resolved Steve Loughran  
        167.
        S3A NullPointerException: null uri host. This can be caused by unencoded / in the password string Sub-task Resolved Unassigned  
        168.
        Option to disable GCM for SSL connections when running on Java 8 Sub-task Resolved Sahil Takiar  
        169.
        S3AInputStream#unbuffer should merge input stream stats into fs-wide stats Sub-task Resolved Sahil Takiar  
        170.
        S3A openFile() options to allow etag/version to be set Sub-task Open Unassigned  
        171.
        Improve isolation of FS instances in S3A committer tests Sub-task Open Unassigned  
        172.
        ITestS3ARemoteFileChanged doesn't overwrite test data creation Sub-task Open Unassigned  
        173.
        ITestS3AMiniYarnCluster fails on sequential runs with Kerberos error Sub-task Open Unassigned  
        174.
        Speed up S3A test runs Sub-task Open Unassigned  
        175.
        S3A returns 400 "bad request" on a single path within an S3 bucket Sub-task Resolved Unassigned  
        176.
        AbstractITCommitMRJob.testMRJob test failures Sub-task Resolved Unassigned  
        177.
        Downgrade INFO message on rm s3a root dir to DEBUG Sub-task Resolved Unassigned  
        178.
        Review S3A documentation to make sure it is consistent with the current codebase Sub-task Open Unassigned  
        179.
        ITestS3ACommitterFactory failing, S3 client is not inconsistent Sub-task Open Steve Loughran  
        180.
        LocatedFileStatusFetcher scans failing intermittently against S3 store Sub-task Resolved Steve Loughran  
        181.
        S3AFileSystem.listLocatedStatus to LIST before HEAD Sub-task Open Unassigned  
        182.
        S3AFileSystem.getContentSummary() to use listFiles(recursive) Sub-task Open Unassigned  
        183.
        Typo in s3a committers.md doc Sub-task Resolved Unassigned  
        184.
        Make last AWS credential provider in default auth chain EC2ContainerCredentialsProviderWrapper Sub-task Resolved Steve Loughran  
        185.
        Restore (documented) fs.s3a.SharedInstanceProfileCredentialsProvider Sub-task Resolved Steve Loughran  
        186.
        S3A delegation token tests fail if fs.s3a.encryption.key set Sub-task Resolved Steve Loughran  
        187.
        S3Guard bucket-info fails if the bucket location is denied to the caller Sub-task Resolved Steve Loughran  
        188.
        S3 Select Exceptions are not being converted to IOEs Sub-task Open Unassigned  
        189.
        S3A doesn't actually verify paths have the correct authority Sub-task Open Unassigned  
        190.
        S3A retry policy to be exponential Sub-task Resolved Steve Loughran  
        191.
        S3ADelegationTokens to only log at debug on startup Sub-task Resolved Steve Loughran  
        192.
        Encrypt S3A buffered data on disk Sub-task Open Unassigned  
        193.
        S3AFilesystem trash handling should respect the current UGI Sub-task Open Unassigned  
        194.
        make sure staging committers collect DTs for the staging FS Sub-task Open Unassigned  
        195.
        S3A Secret access to fall back to XML if credential provider raises IOE. Sub-task Open Unassigned  
        196.
        s3a to improve diags on s3a bad request message Sub-task Open Unassigned  
        197.
        S3A DT support to warn when loading expired token Sub-task Open Steve Loughran  
        198.
        IAM role created by S3A DT doesn't include DynamoDB scan Sub-task Open Unassigned  
        199.
        ITestS3AAWSCredentialsProvider tests fail if a bucket has DTs enabled Sub-task Open Unassigned  
        200.
        ITestS3ARemoteFileChanged tests fail if you set the bucket to versionid tracking Sub-task Open Unassigned  
        201.
        S3A FullCredentialsTokenBinding fails if local credentials are unset Sub-task In Progress Steve Loughran  
        202.
        Use lighter-weight alternatives to innerGetFileStatus where possible Sub-task Open Unassigned  
        203.
        S3A committers leak threads/raises OOM on job/task commit at scale Sub-task Resolved Steve Loughran  
        204.
        S3AFilesystem.initiateRename() can skip check on dest.parent status if src has same parent Sub-task Open Unassigned  
        205.
        s3a attempts to look up password/encryption fail if JCEKS file unreadable Sub-task Resolved Unassigned  
        206.
        s3a to set fake directory marker contentType to application/x-directory Sub-task Open Unassigned  
        207.
        increase the default number of threads and http connections in S3A Sub-task Open Unassigned  
        208.
        S3A ITestRestrictedReadAccess fails Sub-task Resolved Steve Loughran  
        209.
        Speculating & Partitioned S3A magic committers can leave pending files under __magic Sub-task Patch Available Steve Loughran  
        210.
        Tune hadoop-aws parallel test surefire/failsafe settings Sub-task Open Unassigned  
        211.
        S3A ITest*MRjob failures Sub-task Resolved Siddharth Seth  
        212.
        S3A ITest failures without S3Guard Sub-task Resolved Steve Loughran  
        213.
        S3A innerGetFileStatus s"directories only" scan still does a HEAD Sub-task Resolved Steve Loughran  
        214.
        Retrive modtime of PUT file from store, via response or HEAD Sub-task Open Unassigned  
        215.
        S3A Delegation Token extension point to use StoreContext Sub-task Patch Available Steve Loughran  
        216.
        ITestS3AClosedFS failing -junit test thread Sub-task Resolved Steve Loughran  
        217.
        S3 getBucketLocation() can return "US" for us-east Sub-task Resolved Steve Loughran  
        218.
        S3Guard DDB overreacts to no tag access Sub-task Resolved Gabor Bota  
        219.
        typo in TestNeworkBinding Sub-task Open Steve Loughran  
        220.
        s3guard LimitExceededException -too many tables Sub-task Open Unassigned  
        221.
        HadoopExecutors cleanup to only log at debug Sub-task Open David Mollitor  
        222.
        Consider having the ability to turn off TTL in S3Guard + Authoritative mode Sub-task Open Gabor Bota  
        223.
        With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs init() Sub-task Open Unassigned  

          Activity

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              stevel@apache.org Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 26h
                26h
                Remaining:
                Remaining Estimate - 26h
                26h
                Logged:
                Time Spent - Not Specified
                Not Specified