Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-5964

Add ClickHouseIO.Write

Details

    • New Feature
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.9.0
    • io-ideas
    • None

    Description

      Motivation

      ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data that is updated in real time. The project was released as open-source software under the Apache 2 license in June 2016.

      Design and implementation

      1. Do only writes, reads aren't useful because ClickHouse is designed for OLAP queries
      2. For writes, do write in batches and rely on idempotent and atomic inserts supported by replicated tables in ClickHouse
      3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
      4. Rely on having logic for casting rows between schemas in BEAM-5918, and don't put it in ClickHouseIO.Write

      References

      [1] http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
      [2] https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
      [3] https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/

      Attachments

        Activity

          People

            kanterov Gleb Kanterov
            kanterov Gleb Kanterov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 9h 20m
                9h 20m