Description
Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Background
Apache Doris accelerates high-concurrency queries utilizing page cache, where the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which reveals a few problems:
- Hot data will be phased out in large queries
- The page cache configuration is immutable and does not support GC.
Task
- Phase One: Identify the impacts on queries when the decompressed data is stored in memory and SSD, respectively, and then determine whether full page cache is required.
- Phase Two: Improve the cache strategy for Apache Doris based on the results from Phase One.
Learning Material
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Mentor
- Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, yangyongqiang@apache.org
- Mentor: Haopeng Li, Apache Doris PMC member & Committer, lihaopeng@apache.org
- Mailing List: dev@doris.apache.org