Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3816

Erasure Coding

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      We propose to implement Erasure Coding in Apache Ozone to provide efficient storage. With EC in place, Ozone can provide same or better tolerance by giving 50% or more storage space savings.
      In HDFS project, we already have native codecs(ISAL) and Java codecs implemented, we can leverage the same or similar codec design.

      However, the critical part of EC data layout design is in-progress, we will post the design doc soon.

      Also see HDDS-5351, which has a bunch of pre-requisites for the EC feature committed directly to the master branch.

      Attachments

        1. Apache Ozone Erasure Coding-V2.pdf
          294 kB
          Uma Maheswara Rao G
        2. EC-read-write-path.pdf
          170 kB
          Marton Elek
        3. Erasure Coding in Apache Hadoop Ozone.pdf
          375 kB
          Marton Elek
        4. Ozone EC Container groups and instances.pdf
          205 kB
          Marton Elek
        5. Ozone EC v3.pdf
          1.16 MB
          Marton Elek

        Issue Links

        1.
        EC: Introduce the ReplicationConfig and modify the proto files Sub-task Closed Marton Elek Actions
        2.
        EC: Persist replicationIndex on datanode side Sub-task Resolved Marton Elek Actions
        3.
        Enhance SCMServerProtocol with using ReplicationConfig Sub-task Closed Marton Elek Actions
        4.
        EC: allocateBlock API of OM->SCM should take the option EC policy/parameters, which can be used by SCM to select the pipeline provider Sub-task Closed Uma Maheswara Rao G Actions
        5.
        EC: Add replicaIndex to the RPC protocols Sub-task Resolved Marton Elek Actions
        6.
        EC: Introduce EC replication type Sub-task Resolved Marton Elek Actions
        7.
        EC: Implement basic EC pipeline provider Sub-task Resolved Stephen O'Donnell Actions
        8.
        EC: Create ECReplicationConfig on client side based on input string Sub-task Resolved Marton Elek Actions
        9.
        EC: Implement the ECKeyOutputStream which should handle the EC mode writes Sub-task Resolved Uma Maheswara Rao G Actions
        10.
        EC: Add ECReplicationConfig into KeyInfo proto Sub-task Resolved Uma Maheswara Rao G Actions
        11.
        EC: Extend PipelineManager.createPipeline API to support excluded nodes Sub-task Resolved Stephen O'Donnell Actions
        12.
        EC: Allow EC blocks to be requests from OM Sub-task Resolved Stephen O'Donnell Actions
        13.
        EC: Add missing break in switch statement when requesting EC blocks Sub-task Resolved Stephen O'Donnell Actions
        14.
        EC: Add configuration to set an EC container placement Policy Sub-task Resolved Janus Chow Actions
        15.
        EC: allocateContainer should handle ec replication config Sub-task Resolved Uma Maheswara Rao G Actions
        16.
        ContainerStateMap should handle ecReplication config map Sub-task Resolved Stephen O'Donnell Actions
        17.
        EC: ECReplicationConfig should be immutable Sub-task Resolved István Fajth Actions
        18.
        EC: ContainerPlacementPolicyFactory#getPolicyInternal() should not be public Sub-task Resolved István Fajth Actions
        19.
        OMKeyRequest#createFileInfo should handle ECReplicationConfig Sub-task Resolved Uma Maheswara Rao G Actions
        20.
        EC: Implement ECBlockInputStream to read a single EC Block Group. Sub-task Resolved Stephen O'Donnell Actions
        21.
        EC: openKey/createFile should return whether the file created in EC mode Sub-task Resolved Uma Maheswara Rao G Actions
        22.
        EC: Make ECReplicationConfig stored as bucket level attributes. (Example Encryption info) Sub-task Resolved Uma Maheswara Rao G Actions
        23.
        EC: Create a new as many racks as possible placement policy for EC Sub-task Resolved Janus Chow Actions
        24.
        EC: Add padding and generate parity if the last stripe is not full Sub-task Resolved Uma Maheswara Rao G Actions
        25.
        EC: Pipeline builder should copy replica Indexes from original pipeline Sub-task Resolved Stephen O'Donnell Actions
        26.
        EC: commit key should consolidate and create one keyLocationInfo per blockGrp Sub-task Resolved Uma Maheswara Rao G Actions
        27.
        EC: Provide replication config option from CLI when creating bucket. Sub-task Resolved Uma Maheswara Rao G Actions
        28.
        EC: Resolve findbugs warnings after branch merge Sub-task Resolved Stephen O'Donnell Actions
        29.
        EC: ECBlockOutputstream commitKey should create one keyLocationInfo per logical block Sub-task Resolved Stephen O'Donnell Actions
        30.
        EC: Adapt KeyInputStream to read EC Blocks Sub-task Resolved Stephen O'Donnell Actions
        31.
        EC: Add Codec and chunkSize to ECReplicationConfig Sub-task Resolved Stephen O'Donnell Actions
        32.
        EC: ECKeyoutputStream should use codec and chunksize from ECReplicationConfig Sub-task Resolved Uma Maheswara Rao G Actions
        33.
        EC: We should improve the behavior of BasicRootedOzoneClientAdapterImpl#getDefaultReplication with defaults configs moved to server. Sub-task Resolved Uma Maheswara Rao G Actions
        34.
        EC: In BasicRootedOzoneClientAdapterImpl, Inherit bucket default replication config only in the case of EC. Sub-task Resolved Uma Maheswara Rao G Actions
        35.
        EC: Remove hard coded chunksize and get from from ReplicationConfig Sub-task Resolved Stephen O'Donnell Actions
        36.
        EC: Writing a large buffer to an EC file duplicates first chunk in block 1 and 2 Sub-task Resolved Uma Maheswara Rao G Actions
        37.
        EC: Fix TestRootedOzoneFileSystem.testBucketDefaultsShouldBeInheritedToFileForEC failure in branch Sub-task Resolved Uma Maheswara Rao G Actions
        38.
        EC: ECKeyOutputStream#close fails if we write the partial chunk Sub-task Resolved Uma Maheswara Rao G Actions
        39.
        EC: ECKeyOutputStream persists blocks in random order Sub-task Resolved Stephen O'Donnell Actions
        40.
        EC: Implement seek on ECKeyInputStream Sub-task Resolved Stephen O'Donnell Actions
        41.
        EC: Implement seek on ECBlockInputStream Sub-task Resolved Stephen O'Donnell Actions
        42.
        EC: Adopt EC related utility from Hadoop source repository Sub-task Resolved Uma Maheswara Rao G Actions
        43.
        EC: Refactor ECBlockOutputStreamEntry to accommodate all block group related ECBlockOuputStreams. Sub-task Resolved István Fajth Actions
        44.
        EC: Write should handle node failures. Sub-task Resolved Uma Maheswara Rao G Actions
        45.
        EC: Integrate the Codec changes into EC Streams. Sub-task Resolved Uma Maheswara Rao G Actions
        46.
        EC: Fix the compile issue in TestOzoneECClient (Due to concurrent commits) Sub-task Resolved Uma Maheswara Rao G Actions
        47.
        EC: Implement an Input Stream to reconstruct EC blocks ondemand Sub-task Resolved Stephen O'Donnell Actions
        48.
        EC: Implement seek on ECBlockReconstructedStripeInputStream Sub-task Resolved Stephen O'Donnell Actions
        49.
        EC: Review the current flush API and clean up Sub-task Resolved Uma Maheswara Rao G Actions
        50.
        EC: ECBlockReconstructedStripeInputStream should handle block read failures and continue reading Sub-task Resolved Stephen O'Donnell Actions
        51.
        EC: Fix the replication config handling in OMDirectoryCreateRequest#dirKeyInfoBuilderNoACL Sub-task Resolved Uma Maheswara Rao G Actions
        52.
        EC: Fix TestOmMetrics in merge branch Sub-task Resolved Uma Maheswara Rao G Actions
        53.
        EC: Create ECBlockReconstructedInputStream to wrap ECBlockReconstructedStripeInputStream Sub-task Resolved Stephen O'Donnell Actions
        54.
        EC: Fix TestOzoneShellHA failures post master merge with EC branch Sub-task Resolved Uma Maheswara Rao G Actions
        55.
        EC: Optimize ECBlockReconstructedStripeInputStream where there are no missing data indexes. Sub-task Resolved Stephen O'Donnell Actions
        56.
        EC: Change CLI bucket default replication option name to "-type" Sub-task Resolved Uma Maheswara Rao G Actions
        57.
        EC: Track the failed servers to add into the excludeList when invoking allocateBlock Sub-task Resolved Uma Maheswara Rao G Actions
        58.
        Centralize string based replication config validation via ReplicationConfigValidator Sub-task Resolved István Fajth Actions
        59.
        EC: Create ECInputStream wrapper to choose between reconstruction and normal reads. Sub-task Resolved Stephen O'Donnell Actions
        60.
        EC: Provide set replicationConfig option to bucket Sub-task Resolved Uma Maheswara Rao G Actions
        61.
        Fix ReplicationConfig related test failures that happened due to merging HDDS-5997 to the EC branch Sub-task Resolved István Fajth Actions
        62.
        EC: handleStripeFailure should retry Sub-task Resolved Uma Maheswara Rao G Actions
        63.
        EC: ECBlockReconstructedStripeInputStream should read from blocks in parallel Sub-task Resolved Stephen O'Donnell Actions
        64.
        EC: Provide CLI option to reset the bucket replication config Sub-task Resolved Uma Maheswara Rao G Actions
        65.
        EC: HandleStripeFailure should not release the cachebuffers. Sub-task Resolved Uma Maheswara Rao G Actions
        66.
        EC: Create reusable buffer pool shared by all EC input and output streams Sub-task Resolved Stephen O'Donnell Actions
        67.
        EC: Client side exclude nodes list should expire after certain time period or based on the list size. Sub-task Resolved Uma Maheswara Rao G Actions
        68.
        EC: Provide toString in ECReplicationConfig Sub-task Resolved Uma Maheswara Rao G Actions
        69.
        EC: Pipeline creator should ignore creating pipelines for ZERO factor Sub-task Resolved Uma Maheswara Rao G Actions
        70.
        EC: Review the TODOs in GRPC Xceiver client and fix them. Sub-task Resolved István Fajth Actions
        71.
        EC: Document the Ozone EC Sub-task Resolved Uma Maheswara Rao G Actions
        72.
        EC: Introduce a gRPC client implementation for EC with really async WriteChunk and PutBlock Sub-task Resolved István Fajth Actions
        73.
        EC: Replication config from bucket should be refreshed in o3fs. Sub-task Resolved Uma Maheswara Rao G Actions
        74.
        EC: put command should create EC key if bucket is EC Sub-task Resolved Uma Maheswara Rao G Actions
        75.
        EC: Bucket does not display correct EC replication details Sub-task Resolved Stephen O'Donnell Actions
        76.
        EC: Fix flakyness of tests around nodefailures Sub-task Resolved Uma Maheswara Rao G Actions
        77.
        EC: putBlock should pass close flag true on end of block group/close file. Sub-task Resolved Uma Maheswara Rao G Actions
        78.
        EC: Container Info command with json switch fails for EC containers Sub-task Resolved Stephen O'Donnell Actions
        79.
        EC: Smoketest for ozone admin datanode expects exactly 3 nodes Sub-task Resolved Attila Doroszlai Actions
        80.
        EC: Datanode Chunk Validator fails on encountering EC pipeline Sub-task Resolved Attila Doroszlai Actions
        81.
        EC: Create EC acceptance test environment and some basic tests Sub-task Resolved Stephen O'Donnell Actions
        82.
        EC: Replication Manager should skip EC Containers Sub-task Resolved Stephen O'Donnell Actions
        83.
        EC: Parity blocks are incorrectly padded with zeros to the chunk size (#3043) Sub-task Resolved Uma Maheswara Rao G Actions
        84.
        EC: Add replica index to the output in the container info command Sub-task Resolved Stephen O'Donnell Actions
        85.
        EC: Recon UI failing because of EC replication factor Sub-task Resolved Uma Maheswara Rao G Actions
        86.
        EC: Read with stopped but not dead nodes gives IllegalStateException rather than InsufficientNodesException Sub-task Resolved Stephen O'Donnell Actions
        87.
        EC: Pipelines for closed containers should contain correct replica indexes Sub-task Resolved Stephen O'Donnell Actions
        88.
        EC: Fix unaligned stripe write failure due to length overflow. Sub-task Resolved Mark Gui Actions
        89.
        EC: Freon ockg support EC write Sub-task Resolved mingchao zhao Actions
        90.
        EC: Apply fix for HDFS-16422 to the Ozone EC libraries Sub-task Resolved Stephen O'Donnell Actions
        91.
        EC: Fix new checkstyle rule warnings in EC branch Sub-task Resolved Uma Maheswara Rao G Actions
        92.
        EC: Make cluster-wide EC configuration take effect. Sub-task Resolved Kaijie Chen Actions
        93.
        EC: Handle Replication Factor to consider EC Config in Recon UI Sub-task Resolved Uma Maheswara Rao G Actions
        94.
        EC: Fix large write with multiple stripes upon stripe failure. Sub-task Resolved Mark Gui Actions
        95.
        EC: Fix todo items in TestECKeyOutputStream Sub-task Resolved Stephen O'Donnell Actions
        96.
        EC: Adapt java side native coder classes from hadoop. Sub-task Resolved Uma Maheswara Rao G Actions
        97.
        EC: Update help strings for replication config Sub-task Resolved Kaijie Chen Actions
        98.
        EC: Fix the race condition in TestECBlockReconstructedStripeInputStream Sub-task Resolved Uma Maheswara Rao G Actions
        99.
        EC: Freon randomKeys EC key support Sub-task Resolved Kaijie Chen Actions
        100.
        EC: Fix read big file failure with EC policy 10+4. Sub-task Resolved Mark Gui Actions
        101.
        EC: PartialStripe failure handling logic is writing padding bytes also to DNs Sub-task Resolved Uma Maheswara Rao G Actions
        102.
        EC: Do not throw NotImplementedException in flush() Sub-task Resolved Kaijie Chen Actions
        103.
        EC: Refactor ECKeyOutputStream#write() Sub-task Resolved Kaijie Chen Actions
        104.
        EC: Fix allocateBlockIfFull condition in ECKeyOutputStream#write() Sub-task Resolved Kaijie Chen Actions
        105.
        EC: Calculate EC replication correctly when updating bucket usage Sub-task Resolved Stephen O'Donnell Actions
        106.
        EC: Key Info command should not display legacy replication fields as they duplicate ReplicationConfig Sub-task Resolved Stephen O'Donnell Actions
        107.
        EC: Exclude pipeline upon container close instead of exclude DNs. Sub-task Resolved Mark Gui Actions
        108.
        EC: Automaticly switch to next OutputStream in ECBlockOutputStreamEntry Sub-task Resolved Kaijie Chen Actions
        109.
        EC: Discard pre-allocated blocks to eliminate worthless retries. Sub-task Resolved Mark Gui Actions
        110.
        EC: Review use of ReplicationConfig.getLegacyFactor() in the codebase Sub-task Resolved Uma Maheswara Rao G Actions
        111.
        EC: add padding problem Sub-task Resolved cchenaxchen Actions
        112.
        EC: Ensure EC container usage is updated correctly when handling reports Sub-task Resolved Stephen O'Donnell Actions
        113.
        EC: EC keys can't be created via S3 interfaces Sub-task Resolved Uma Maheswara Rao G Actions
        114.
        EC: ListPipelines command should consider EC Containers Sub-task Resolved Stephen O'Donnell Actions
        115.
        EC: Fix broken future chain and cleanup unnecessary validation function. Sub-task Resolved Mark Gui Actions
        116.
        EC: Priory of replication config in Ozone FS Sub-task Resolved Kaijie Chen Actions
        117.
        EC: Fix too many idle threads during reconstruct read. Sub-task Resolved Mark Gui Actions
        118.
        EC: Avoid allocating buffers in EC Reconstruction Streams until first read Sub-task Resolved Stephen O'Donnell Actions
        119.
        EC: OzoneManagerRequestHandler needs to handle ECReplicationConfig Sub-task Resolved Stephen O'Donnell Actions
        120.
        EC: Container list command should allow filtering of EC containers Sub-task Resolved Stephen O'Donnell Actions
        121.
        EC: Fix allocateBlock failure due to inaccurate excludedNodes check. Sub-task Resolved Mark Gui Actions
        122.
        EC: Fix number of preAllocatedBlocks for EC keys Sub-task Resolved Kaijie Chen Actions
        123.
        EC: [Refactor-1] Check isFullCell inside handleDataWrite Sub-task Resolved Kaijie Chen Actions
        124.
        EC: Adjust requested size by EC DataNum in WritableECContainerProvider Sub-task Resolved Stephen O'Donnell Actions
        125.
        EC: OmMultipartKeyInfo needs to handle ECReplicationConfig Sub-task Resolved Attila Doroszlai Actions
        126.
        EC: Reconstructed Input Streams should free resources after reading to end of block Sub-task Resolved Stephen O'Donnell Actions
        127.
        EC: Improve exception message in ByteBufferEncodingState Sub-task Resolved cchenaxchen Actions
        128.
        EC: Handle ECReplicationConfig in initiateMultipartUpload API Sub-task Resolved Uma Maheswara Rao G Actions
        129.
        EC: the offset is less than writeoffset Sub-task Resolved cchenaxchen Actions
        130.
        EC: Fix ISA-l load hadoop native lib UnsatisfiedLinkError Sub-task Resolved cchenaxchen Actions
        131.
        EC: OzoneMultipartUpload needs to handle ECReplicationConfig Sub-task Resolved Attila Doroszlai Actions
        132.
        EC: OzoneMultipartUploadPartListParts needs to handle ECReplicationConfig Sub-task Resolved Attila Doroszlai Actions
        133.
        EC: write when the datanode not enough Sub-task Resolved cchenaxchen Actions
        134.
        EC: Overwriting an EC key with a Ratis key fails Sub-task Resolved Uma Maheswara Rao G Actions
        135.
        EC: Execute S3 acceptance tests with EC Sub-task Resolved Attila Doroszlai Actions
        136.
        EC: Unify replication-related CLI params Sub-task Resolved Attila Doroszlai Actions
        137.
        EC: [Forward compatibility issue] New client to older server could fail due to the unavailability for client default replication config Sub-task Resolved István Fajth Actions
        138.
        EC: scm CheckAndRecoverECContainer command Sub-task Resolved Unassigned Actions
        139.
        EC: Onboard EC into upgrade framework Sub-task Resolved István Fajth Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            umamaheswararao Uma Maheswara Rao G
            umamaheswararao Uma Maheswara Rao G
            Votes:
            1 Vote for this issue
            Watchers:
            39 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment