Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-5792

Parallel encoding in MemTable flush

    XMLWordPrintableJSON

Details

    Description

      Currently, there is only one encoding task for each MemTable flushing task. In other words, the encoding during flushing a MemTable is fully serialized. Thus, when the size of MemTable is large, the encoding will be considerably time-consuming. This is especially true when the computing power of a single core is low, which is common for commercial servers with many cores.

      In one of my experiments, there are 1M time series (datatype = double) in a MemTable, and the avg point number of each series is around 300, making the total size of the MemTable about 5GB. The time of encoding such a MemTable is, incredibly, over 100s. The system easily into a reject status because the flushing is so slow.

      Since the encoding process is naturally parallelizable (it is a purely in-memory operation with perfect locality), I would like to propose replacing the single-threaded encoding process with multiple threads.

      Attachments

        Issue Links

          Activity

            People

              jt2594838 Tian Jiang
              jt2594838 Tian Jiang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: