AsterixDB currently uses a classical two-phase log replay approach during recovery by first identifying committed writes and then applying these commit writes to LSM-trees. This is a stardard approach for general-purpose transaction processing systems, but for AsterixDB, we can design something better.
AsterixDB uses a record-level transaction model where each write is committed as soon as possible by "entity commit". To exploit this property, we can design a one-phase log replay approach as follows:
- Start from the log head based on the low watermark LSN
- Whenever we see an update log record, store that log record in memory (for each job)
- Whenever we see an entity commit or abort record, redo the corresponding update log record immediately and remove it from memory
The key property here is that the window between an update log record and a commit log record is very short - we commit on a frame basis. Thus, this will speed up the recovery process by only using one log read pass and avoiding store all entity commits in memory. We only need a small amount of memory, based on the window between updates and commits, during the recovery proces.