Details
-
Wish
-
Status: Reopened
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
IoTDB is a highly efficient time series database, which supports high speed query process, including aggregation query.
Currently, IoTDB pre-calculates the aggregation info, or called the summary info, (sum, count, max_time, min_time, max_value, min_value) for each page and each Chunk. The info is helpful for aggregation operations and some query filters. For example, if the query filter is value >10 and the max value of a page is 9, we can skip the page. For another example, if the query is select max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value) is 20.
However, there are two drawbacks:
1. The summary info actually reduces the data that needs to be scanned as 1/k (suppose each page has k data points). However, the time complexity is still O(N). If we store a long historical data, e.g., storing 2 years data with 500KHz, then the aggregation operation may be still time-consuming. So, a tree-based index to reduce the time complexity from O(N) to O(logN) is a good choice. Some basic ideas have been published in [1], while it can just handle data with fix frequency. So, improving it and implementing it into IoTDB is a good choice.
2. The summary info is helpless for evaluating the query like where value >8 if the max value = 10. If we can enrich the summary info, e.g., storing the data histogram, we can use the histogram to evaluate how many points we can return.
This proposal is mainly for adding an index for speeding up the aggregation query. Besides, if we can let the summary info be more useful, it could be better.
Notice that the premise is that the insertion speed should not be slow down too much!
You should know:
• IoTDB query process
• TsFile structure and organization
• Basic index knowledge
• Java
difficulty: Major
mentors:
hxd@apache.org
Reference:
[1] https://www.sciencedirect.com/science/article/pii/S0306437918305489