Description
Objectives
Support UPDATE for Doris Duplicate Key Table
Currently, Doris supports three data models, Duplicate Key / Aggregate Key / Unique Key, of which Unique Key has perfect data update support (including UPDATE statement). With the widespread popularity of Doris, users have more demands on Doris. For example, some user needs to perform ETL processing operations inside Doris, but they uses Duplicate Key table and hopes that Duplicate Key can also support UPDATE. For Duplicate Key, since there is no primary key can help we locate one specific row, UPDATE is low efficient. The usual practice is to rewrite all the data, even if the user only updates one field of a row of data, he must rewrite at least the segment file it is in. Another potentially more efficient solution is to implement Duplicate Key by combining Unique Key's Merge-on-Write, and the auto_increment column. i.e., let's change the underlying implementation of Duplicate Key to use Unique Key MoW, and add a hidden auto_increment column in the primary key, so that all the keys written by the user to the Unique Key MoW table are not duplicated, which realizes the semantics of Duplicate Key, and since each row of data has a unique primary key, we can reuse the UPDATE capability of Unique Key to support the Duplicate Key's UPDATE
We would like participants to help design and implement the solution, and perform performance testing for comparison and performance optimization.
Recommended Skills
Familiar with C++ programming
Familiar with the storage layer of Doris
Mentor
Mentor: Chen Zhang, Apache Doris Committer, chzhang1987@gmail.com
Mentor: Guolei Yi, Apache Doris PMC Member, yiguolei@gmail.com
Mailing List: dev@doris.apache.org
Website: https://doris.apache.org