Description
Motivation
ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data that is updated in real time. The project was released as open-source software under the Apache 2 license in June 2016.
Design and implementation
1. Do only writes, reads aren't useful because ClickHouse is designed for OLAP queries
2. For writes, do write in batches and rely on idempotent and atomic inserts supported by replicated tables in ClickHouse
3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
4. Rely on having logic for casting rows between schemas in BEAM-5918, and don't put it in ClickHouseIO.Write
References
[1] http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
[2] https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
[3] https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/