Apache IoTDB is a high-performance time-series database. Its cluster mode is in development.
As we know, IoTDB uses a columnar file format, called TsFile, which is similar with Parquet. In such a columnar file, the order of columns will impact the query performance hugely. We call the order of columns in the file as the physical layout of the file.
In the distributed version of IoTDB, the data is replicated multiple times for data reliability, and a read operation can be routed to any one of them such that the query load is spread across the nodes.
If a query runs slowly on one node because of an unsuitable physical layout, rather than the overhead of the node, routing the query to other nodes is of no use. This is because the physical layout of the data on the disk on all nodes is the same.
The proposal is for:
Accelerate queries by organizing different replicas into different layout according to the query history.
Then we need to:
- collect the query history and find out which queries are frequent;
- find an algorithm to get the best physical layout for the queries.
It is totally predictable that this feature will improve the performance of IoTDB and make it unique with other distributed systems.
You need to know:
- Quorum based replica control
- Some stream algorithms