Uploaded image for project: 'Comdev GSOC'
  1. Comdev GSOC
  2. GSOC-105

RocketMQ TieredStore Integration with High Availability Architecture

    XMLWordPrintableJSON

Details

    Description

      Apache RocketMQ{}

      Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

      Page: https://rocketmq.apache.org

       

      Background

      With the official release of RocketMQ 5.1.0, tiered storage has arrived as a new independent module in the Technical Preview milestone. This allows users to unload messages from local disks to other cheaper storage, extending message retention time at a lower cost.

      Reference RIP-57: https://github.com/apache/rocketmq/wiki/RIP-57-Tiered-storage-for-RocketMQ

      In addition, RocketMQ introduced a new high availability architecture in version 5.0.

      Reference RIP-44: https://github.com/apache/rocketmq/wiki/RIP-44-Support-DLedger-Controller

      However, currently RocketMQ tiered storage only supports single replicas.

       

      Task

      Currently, tiered storage only supports single replicas, and there are still the following issues in the integration with the high availability architecture:

      • Metadata synchronization: how to reliably synchronize metadata between master and slave nodes.
      • Disallowing message uploads beyond the confirm offset: to avoid message rollback, the maximum uploaded offset cannot exceed the confirm offset.
      • Starting multi-tier storage upload when the slave changes to master, and stopping tiered storage upload when the master becomes the slave: only the master node has write and delete permissions, and after the slave node is promoted, it needs to quickly resume tiered storage breakpoint resumption.
      • Design of slave pull protocol: how a newly launched empty slave can properly synchronize data through the tiered storage architecture. (If synchronization is performed based on the first or last file, resumption of breakpoints may not be possible when switching again).

      So you need to provide a complete plan to solve the above issues and ultimately complete the integration of tiered storage and high availability architecture, while verifying it through the existing tiered storage file version and OpenChaos testing.

       

      Relevant Skills

      • Interest in messaging middleware and distributed storage systems
      • Java development skills
      • Having a good understanding of RocketMQ tiered storage and high availability architecture

      Attachments

        Activity

          People

            Unassigned Unassigned
            jinrongtong Rongtong Jin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 350h
                350h
                Remaining:
                Remaining Estimate - 350h
                350h
                Logged:
                Time Spent - Not Specified
                Not Specified