Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-1131

dictionary encoding of deviceID and measurementID in WAL

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Core/WAL

    Description

      This is an interesting idea that proposed by Tian Jiang.

      Copy from Tian Jiang:

      Write ahead logs (WALs) ensure that data which are not persisted yet can still be recovered from a system failure, thus to increase the durability of a DBMS. However, WALs generally require more frequent flushes to limit the possibility of losing data, which increases disk utilities significantly as each flush requires one disk I/O. Moreover, logs are hardly compressed or encoded like what we are doing to the raw data in TsFiles, and result is that logs containing the same data consume much larger space than the data chunks. The disadvantages are two-folds: first, large logs will compete for more disk bandwidth, slowing down the persistence of raw data; second, even if WALs are placed on another disk, (possibly SSD for high throughput), as WALs are removed frequently once their corresponding data are persisted, such frequent write-and-erases will shorten disk life especially for SSDs.

      So it is beneficial to reduce the sizes of WALs. In IoTDB (and also other DBMSs), the majority of WALs are logs of insertions, as other operations like deletions and updates are often rare compared with insertions. This observation enlightens us that may focus on reducing sizes of insertion logs, which is enough to attain ideal improvement of the whole system. Currently, we serialize complete physical plans into WAL, but we notice that despite values and timestamps generally varies from plan to plan, head information like deviceIds, measurementIds and data types are highly redundant, and sometimes deviceIds and measurementIds are long strings, which may consume a significant amount of space. So in this design, we concentrate on reducing duplicated deviceIds, measurementIds and data types in WALs.

      Method
      To reduce duplicated deviceIds, measurementIds and data types in WALs, we use windowed differentiation technique (or referencing) to replace redundant fields with a index pointing to a base log, if such a log can be found within a given window. Detailed procedure are described below:

      Attachments

        Activity

          People

            Unassigned Unassigned
            hxd Xiangdong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: