[IOTDB-5792] Parallel encoding in MemTable flush - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: master branch
Component/s: Core/Engine
Labels:

Description

Currently, there is only one encoding task for each MemTable flushing task. In other words, the encoding during flushing a MemTable is fully serialized. Thus, when the size of MemTable is large, the encoding will be considerably time-consuming. This is especially true when the computing power of a single core is low, which is common for commercial servers with many cores.

In one of my experiments, there are 1M time series (datatype = double) in a MemTable, and the avg point number of each series is around 300, making the total size of the MemTable about 5GB. The time of encoding such a MemTable is, incredibly, over 100s. The system easily into a reject status because the flushing is so slow.

Since the encoding process is naturally parallelizable (it is a purely in-memory operation with perfect locality), I would like to propose replacing the single-threaded encoding process with multiple threads.

Attachments

Issue Links

links to

GitHub Pull Request #9667

Activity

People

Assignee:: Tian Jiang

Reporter:: Tian Jiang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Apr/23 02:31

Updated:: 01/Jun/23 01:07