Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7593

Supporting HSync and lease recovery

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.0
    • None
    • None
    • None

    Description

      This is the umbrella jira encompassing the design and implementation of hflush API and lease recovery support in Ozone. This feature enables new use cases such as HBase and Solr where the Write Ahead Log and Transaction Logs are flushed to Ozone constantly.

      A design doc will be added shortly.

      Attachments

        Issue Links

        1.
        Ozone client change to support HSync Sub-task Resolved Tsz-wo Sze   Actions
        2.
        Add support for application to probe output stream capability Sub-task Resolved Wei-Chiu Chuang   Actions
        3.
        Ozone Manager change to support HSync Sub-task Resolved Wei-Chiu Chuang   Actions
        4.
        hsync: Change KeyOutputStream to update length in OM Sub-task Resolved Unassigned   Actions
        5.
        Potential discrepancy of key creation time may cause premature open key clean up Sub-task Resolved Tsz-wo Sze   Actions
        6.
        OM lease recovery for hsync'ed files Sub-task Resolved Tsz-wo Sze   Actions
        7.
        [hsync] Recon throws ClassCastException Sub-task Resolved Arafat Khan   Actions
        8.
        [hsync] Outputstream in encrypted buckets do not return the correct stream capabilities Sub-task Resolved Wei-Chiu Chuang   Actions
        9.
        input stream does not refresh expired block token Sub-task Resolved Wei-Chiu Chuang   Actions
        10.
        Change Get Key Info to return HSync info Sub-task Resolved Wei-Chiu Chuang

        0%

        Original Estimate - 72h
        Remaining Estimate - 72h
        Actions
        11.
        Ozone file systems to support Hadoop's PathCapabilities interface Sub-task Resolved Wei-Chiu Chuang   Actions
        12.
        Add hsync metrics in OM Sub-task Resolved Tsz-wo Sze   Actions
        13.
        Implement client initiated lease recovery Sub-task Resolved Wei-Chiu Chuang   Actions
        14.
        Add a flag to disable hsync by default Sub-task Resolved Wei-Chiu Chuang   Actions
        15.
        [hsync] KeyOutputStream is not thread safe Sub-task Resolved Wei-Chiu Chuang   Actions
        16.
        LeaseRecovery failing with NullPointer exception Sub-task Resolved Wei-Chiu Chuang   Actions
        17.
        [hsync] Add a CLI to recover lease Sub-task Resolved Wei-Chiu Chuang   Actions
        18.
        OM crash with NPE in OMKeyCommitRequest due to missing user info Sub-task Resolved Sumit Agrawal   Actions
        19.
        Quota needs to be updated correctly for Hsync Sub-task Resolved Sumit Agrawal   Actions
        20.
        TestHSync is no longer flaky Sub-task Resolved Tsz-wo Sze   Actions
        21.
        [hsync] reject renaming open file Sub-task Resolved Wei-Chiu Chuang   Actions
        22.
        O3fs/ofs to support setTimes() API Sub-task Resolved Wei-Chiu Chuang   Actions
        23.
        [hsync] OMKeyRequest: Detect allocated but uncommitted blocks Sub-task Resolved Wei-Chiu Chuang   Actions
        24.
        Disallow overwriting a hsync'ed key Sub-task Resolved Wei-Chiu Chuang   Actions
        25.
        Support setSafeMode(), isFileClosed() FileSystem API Sub-task Resolved Wei-Chiu Chuang   Actions
        26.
        [hsync] HBase RegionServer input stream not shut down properly Sub-task Resolved Unassigned   Actions
        27.
        OM to reject hsync if ozone.fs.hsync.enabled is false Sub-task Resolved Wei-Chiu Chuang   Actions
        28.
        [hsync] A freon tool to focus on hsync/hflush performance Sub-task Resolved Wei-Chiu Chuang   Actions
        29.
        ozone freon --server is broken by HDDS-6176 Sub-task Resolved Wei-Chiu Chuang   Actions
        30.
        ChunkInputStream should use new token after pipeline refresh Sub-task Resolved Attila Doroszlai   Actions
        31.
        [hsync] File recovery support in OM Sub-task Resolved Sammi Chen   Actions
        32.
        [hsync] File recovery support in Client Sub-task Resolved Sammi Chen   Actions
        33.
        hsync: Interface to retrieve block info and finalize block in DN through ratis Sub-task Resolved Ashish Kumar   Actions
        34.
        [hsync] DataNode to deserialize Ratis transaction only once Sub-task Resolved Tsz-wo Sze   Actions
        35.
        Make recoverLease call idempotent Sub-task Resolved Sammi Chen   Actions
        36.
        [hsync] Make Putblock performance acceptable - Skeleton code Sub-task Resolved Wei-Chiu Chuang   Actions
        37.
        [hsync] Cache serialized block token in output stream to reduce heap consumption Sub-task Resolved Wei-Chiu Chuang   Actions
        38.
        Introduce soft limit support for lease recovery Sub-task Resolved Ashish Kumar   Actions
        39.
        Add admin CLI to list open files Sub-task Resolved Siyao Meng   Actions
        40.
        [hsync] Redesign the lease recovery protocol so block length is updated correctly at OM Sub-task Resolved Unassigned   Actions
        41.
        [hsync] Rebase HDDS-7593 branch onto master Sub-task Resolved Wei-Chiu Chuang   Actions
        42.
        [hsync] write after lease recovery does not fail Sub-task Resolved Sammi Chen   Actions
        43.
        [hsync] Make Putblock performance acceptable - DataNode side Sub-task Resolved Wei-Chiu Chuang   Actions
        44.
        [hsync]Support hard limit and auto recovery for hsync file Sub-task Resolved Ashish Kumar   Actions
        45.
        [hsync] Reduce updating block length times at OM during hsync Sub-task Resolved Sammi Chen   Actions
        46.
        two client parallel perform commit with Hsync can cause dataloss Sub-task Resolved Siyao Meng   Actions
        47.
        Deleted file reappears after HSync Sub-task Resolved Siyao Meng   Actions
        48.
        Add hsync metadata to hsync'ed keys in OpenKeyTable as well Sub-task Resolved Siyao Meng   Actions
        49.
        Migrate tests to JUnit5 Sub-task Resolved Ashish Kumar   Actions
        50.
        [hsync] Make Putblock performance acceptable - Client side Sub-task Resolved Wei-Chiu Chuang   Actions
        51.
        Merge recent commits from master to HDDS-7593 Sub-task Resolved Wei-Chiu Chuang   Actions
        52.
        [hsync] Make Putblock performance acceptable Sub-task Resolved Wei-Chiu Chuang   Actions
        53.
        DataNode doesn't set proper DatanodeVersion when registering with SCM Sub-task Resolved Siyao Meng   Actions
        54.
        [hsync] Output stream should support direct byte buffer Sub-task Resolved Wei-Chiu Chuang   Actions
        55.
        [hsync] disk usage thread aborts if ratis log rolls very quickly Sub-task Resolved Unassigned   Actions
        56.
        [hsync] Revisit configuration keys for incremental chunk list after HDDS-9884 Sub-task Resolved Wei-Chiu Chuang   Actions
        57.
        [hsync] MockDatanodeStorage.writeChunk should make a copy of byte string Sub-task Resolved Wei-Chiu Chuang   Actions
        58.
        Merge recent commits from master (7c8160fe) to HDDS-7593 Sub-task Resolved Siyao Meng   Actions
        59.
        [hsync] Refresh block token immediately if block token expires Sub-task Resolved Wei-Chiu Chuang   Actions
        60.
        OzoneFSInputStream to support ByteBufferPositionedReadable Sub-task Resolved Ashish Kumar   Actions
        61.
        [hsync] Add a Freon tool to measure client to DataNode round-trip latency Sub-task Resolved Wei-Chiu Chuang   Actions
        62.
        [hsync] Combine WriteData and PutBlock requests into one Sub-task Resolved Wei-Chiu Chuang   Actions
        63.
        [LeaseRecovery] OM shuts down with "SecretKey client must have been initialized already" Sub-task Resolved Sammi Chen   Actions
        64.
        [hsync] improve block token refresh message Sub-task Resolved Wei-Chiu Chuang   Actions
        65.
        [hsync] Add OpenTracing traces to client side read path Sub-task Resolved Wei-Chiu Chuang   Actions
        66.
        Merge master 97038ef to feature branch HDDS-7593 Sub-task Resolved Siyao Meng   Actions
        67.
        [hsync] Merge recent commits from master #4 Sub-task Resolved Wei-Chiu Chuang   Actions
        68.
        HBase WAL splitting fails due to lease recovery Sub-task Resolved Sammi Chen   Actions
        69.
        [hsync] OMKeyCommitRequest should reject if client id doesn't match Sub-task Resolved Chung En Lee   Actions
        70.
        [hsync] lease recovery contract test class not substantiated Sub-task Resolved Chung En Lee   Actions
        71.
        [hsync] Show deleted hsync keys in ListOpenFile CLI Sub-task Resolved Ashish Kumar   Actions
        72.
        Merge recent commits from master to HDDS-7593 Sub-task Resolved Ashish Kumar   Actions
        73.
        Show overwritten hsync keys in ListOpenFile CLI Sub-task Resolved Sammi Chen   Actions
        74.
        Wrong count is displaying in listOpenFile CLI Sub-task Resolved Unassigned   Actions
        75.
        [hsync] Output stream lastChunkBuffer should use direct buffer Sub-task Resolved Ashish Kumar   Actions
        76.
        [hsync] Flush to only wait for majority of DataNodes Sub-task Resolved Unassigned   Actions
        77.
        Remove unused UserGroupInformation object in DataNode token verifier Sub-task Resolved Wei-Chiu Chuang   Actions
        78.
        Freon tool DN-Echo to support GRPC and Ratis read/write mode Sub-task Resolved Wei-Chiu Chuang   Actions
        79.
        [hsync] Increase default value for hdds.container.ratis.log.appender.queue.num-elements Sub-task Resolved Wei-Chiu Chuang   Actions
        80.
        Intermittent failure in TestLeaseRecovery.testFinalizeBlockFailure Sub-task Resolved Ashish Kumar   Actions
        81.
        recoverLease should close underlying streams Sub-task Resolved Ashish Kumar   Actions
        82.
        [hsync] 6th merge from master Sub-task Resolved Wei-Chiu Chuang   Actions
        83.
        Merge master branch 611066a to HDDS-7593 dev branch Sub-task Resolved Siyao Meng   Actions
        84.
        [hsync] Parameterize TestBlockOutputStream on ozone.client.stream.putblock.piggybacking Sub-task Resolved Wei-Chiu Chuang   Actions
        85.
        [hsync] Remove block token from Ratis log once verified Sub-task Resolved Wei-Chiu Chuang   Actions
        86.
        [hsync] Investigate why DataNode Echo throughput is so low Sub-task Resolved Wei-Chiu Chuang   Actions
        87.
        [hsync] Client side metrics Sub-task Resolved Wei-Chiu Chuang   Actions
        88.
        [File Lease] OM adds request message handler Sub-task Resolved Wei-Chiu Chuang   Actions
        89.
        [File Lease] Client side lease renewer thread and request message Sub-task Resolved Wei-Chiu Chuang   Actions
        90.
        [File Lease] OM adds FileLeaseManager Sub-task Resolved Wei-Chiu Chuang   Actions
        91.
        [hsync] 7th merge from master Sub-task Resolved Wei-Chiu Chuang   Actions
        92.
        [hsync] Block finalization should also merge last chunk to blockDataTable Sub-task Resolved Wei-Chiu Chuang   Actions
        93.
        [hsync] Adopt Ratis 3.1.0 when it's released Sub-task Resolved Chung En Lee   Actions
        94.
        Add a few interesting ContainerStateMachine metrics in CSMMetrics Sub-task Resolved Wei-Chiu Chuang   Actions
        95.
        [hsync] 8th merge from master Sub-task Resolved Siyao Meng   Actions
        96.
        [hsync] Checking disk capacity at every write request is expensive for HBase Sub-task Resolved Attila Doroszlai   Actions
        97.
        [hsync] Add a freon tool to benchmark hsync/write concurrency Sub-task Resolved Duong   Actions
        98.
        [hsync] 9th merge from master Sub-task Resolved Siyao Meng   Actions
        99.
        [hsync] De-synchronize hsync API Sub-task Resolved Duong   Actions
        100.
        [hsync] Improve BlockOutputStream's BufferPool to support variable buffer allocation from concurrent hsync Sub-task Resolved Duong   Actions
        101.
        [hsync] Replace expensive VolumeUsage.getMinVolumeFreeSpace() Sub-task Resolved Wei-Chiu Chuang   Actions
        102.
        [hsync] Instantiates audit parameter lazily in DataNode dispatch handler Sub-task Resolved Wei-Chiu Chuang   Actions
        103.
        Fix ContainerOpsLatencies metrics Sub-task Resolved Duong   Actions
        104.
        KeyOutputStream flakiness when running write and hsync concurrently Sub-task Resolved Duong   Actions
        105.
        [hsync] Support renaming open files Sub-task Resolved Unassigned   Actions
        106.
        [hsync] Make Putblock performance acceptable - Tool cleanup, guardrails Sub-task Resolved Wei-Chiu Chuang   Actions
        107.
        Increase hdds.datanode.handler.count Sub-task Resolved Wei-Chiu Chuang   Actions
        108.
        Increase ipc.server.read.threadpool.size Sub-task Resolved Wei-Chiu Chuang   Actions
        109.
        [hsync] Add new OM layout version Sub-task Resolved Wei-Chiu Chuang   Actions
        110.
        [hsync] DataNode should verify HBASE_SUPPORT layout version for every PutBlock Sub-task Resolved Wei-Chiu Chuang   Actions
        111.
        [hsync] Add Ozone Manager protocol version Sub-task Resolved Wei-Chiu Chuang   Actions
        112.
        TestBlockOutputStream.testWriteMoreThanFlushSize is flaky Sub-task Resolved Ashish Kumar   Actions
        113.
        [hsync] Merge HDDS-7593 feature branch into master Sub-task Resolved Ashish Kumar   Actions
        114.
        Merge recent commits from master to HDDS-7593 Sub-task Resolved Ashish Kumar   Actions
        115.
        [hsync] Add DN layout version (HBASE_SUPPORT/version 8) upgrade test Sub-task Resolved Wei-Chiu Chuang   Actions
        116.
        [hsync] Change XceiverClientRatis.watchForCommit to async Sub-task Resolved Tsz-wo Sze   Actions
        117.
        [hsync] Move HBASE_SUPPORT layout upgrade test into its own test Sub-task Resolved Wei-Chiu Chuang   Actions
        118.
        [hsync] Add a client config to limit write concurrency on the same key Sub-task Resolved Siyao Meng   Actions
        119.
        [hsync] Revert config default ozone.fs.hsync.enabled to false Sub-task Resolved Siyao Meng   Actions
        120.
        [hsync] Block ECKeyOutputStream from calling hsync and hflush Sub-task Resolved Siyao Meng   Actions
        121.
        ContainerStateMachine should not crash because of CHUNK_FILE_INCONSISTENCY Sub-task Resolved Duong   Actions
        122.
        Flakiness in KeyOutputStream exception handling Sub-task Resolved Duong   Actions
        123.
        [hsync] Enable PutBlock piggybacking and incremental chunk list by default Sub-task Resolved Wei-Chiu Chuang   Actions
        124.
        Change RatisBlockOutputStream to use HDDS-11174 Sub-task Resolved Tsz-wo Sze   Actions
        125.
        [hsync] Remove hsync and hflush capability check in ContentGenerator Sub-task Resolved Hemant Kumar   Actions
        126.
        Use OMLayoutFeature.HBASE_SUPPORT for HSYNC Sub-task Resolved Hemant Kumar   Actions
        127.
        [hsync] Add upgrade tests Sub-task Resolved Hemant Kumar   Actions
        128.
        [hsync] Add a config as HBase-related features master switch Sub-task Resolved Siyao Meng   Actions
        129.
        [hsync] Remove KeyOutputStreamSemaphore logs Sub-task Resolved Chung En Lee   Actions
        130.
        [hsync] Compatibility test Sub-task Open Hemant Kumar   Actions
        131.
        Support incremental ChunkBuffer checksum calculation Sub-task Open Siyao Meng   Actions
        132.
        [hsync] Optimize FilePerBlockStrategy.writeChunk() Sub-task Open Siyao Meng   Actions
        133.
        [hsync] Update OpenTelemetry traces in the write path Sub-task In Progress Wei-Chiu Chuang   Actions
        134.
        Make key visible immediately upon create() Sub-task Open Unassigned   Actions
        135.
        Add hsync and lease recovery Documentation Sub-task In Progress Wei-Chiu Chuang   Actions
        136.
        Add test case to cover HDDS-9930 batch key deletion Sub-task Open Unassigned   Actions
        137.
        [hsync] Adopt RATIS-1994 to reduce hsync latency Sub-task Patch Available Siyao Meng   Actions
        138.
        Any open key rename should ideally be blocked Sub-task Open Unassigned   Actions
        139.
        Increase DataNode XceiverServerGrpc event loop group size Sub-task Open Unassigned   Actions
        140.
        [Discuss] Colocate Ozone client and pipeline leader to reduce latency Sub-task Open Unassigned   Actions
        141.
        [hsync] Freon DN Echo to support --wait-for-commit parameter Sub-task Open Wei-Chiu Chuang   Actions
        142.
        Explore client retry optimizations after write() and hsync() are desynced Sub-task Open Unassigned   Actions
        143.
        Freon DN Echo should skip writing payload to ratis log Sub-task In Progress Wei-Chiu Chuang   Actions
        144.
        [hsync] Improve test coverage for LeaseRecoveryClientDNHandler Sub-task Open Unassigned   Actions
        145.
        [hsync] Improve test coverage for XceiverClientRatis.java Sub-task Open Unassigned   Actions
        146.
        Hsync client-side metrics Sub-task Open Duong   Actions
        147.
        Manage Netty native memory consumption Sub-task Open Unassigned   Actions
        148.
        Add metrics for OzoneChecksumException Sub-task Open Unassigned   Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            weichiu Wei-Chiu Chuang
            weichiu Wei-Chiu Chuang

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment