Uploaded image for project: 'Comdev GSOC'
  1. Comdev GSOC
  2. GSOC-120

[GSoC][Doris]Dictionary Encoding Acceleration

    XMLWordPrintableJSON

Details

    Description

      Apache Doris
      Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
      Page: https://doris.apache.org

      Github: https://github.com/apache/doris

      Background

      In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example.
       

      Task

      • Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries.
      •  Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case.

      Learning Material

      Page: https://doris.apache.org
      Github: https://github.com/apache/doris

      Mentor

      Attachments

        Activity

          People

            Unassigned Unassigned
            luzhijing Zhijing Lu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: