Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-853

Log compaction by omitting identical log fields



    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Core/WAL


      [1] mentioned an interesting way of log compaction, which records the page Id and txn Id of the previous log and omit the one in the next log if they are the same.

      I think it is very possible to apply such a technique to IoTDB's WAL. During the persistence of logs, we may keep a log window of the previous N logs, and when we are going to persist one log, we search the log window to find the nearest log with the same type and see if that log has the same field as the current one, e.g., it is very possible that neighboring insertions will have the same deviceIds and measurementIds, so we can directly use a forward reference to fill the log field (like using "3" meanings this field has the same value as the log whose index is smaller by 3 than the current one). This way, a very long path can be simply replaced by a byte (0~255), and disk space and I/O may be saved greatly.

      The idea itself can be implemented easily, but the challenges locate in that how to define a proper window length and compare logs efficiently so that the additional computing will not become another bottleneck.

      [1] Michael Haubenschild, Caetano Sauer, Thomas Neumann, and Viktor Leis. 2020. Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 877–892. DOI:https://doi.org/10.1145/3318464.3389716




            Unassigned Unassigned
            jt2594838 Tian Jiang
            0 Vote for this issue
            1 Start watching this issue