Problem
When NBCC or async compaction happens, there is a chance that the positions generated at the time of writing the log blocks can be inaccurate for the snapshot read or compaction at a later time due to new file slicing.
The heart of the problem is that when generating the positions, it is based on the current base file available; the snapshot read or compaction at a later time can rebase the log file onto a new base file based on the completion time-based file slicing. If the new base file is generated from the old base file with deletes in log files, the positions will be wrong, and the merging results will also be wrong.
Take the following example, when writing .fg1_ts6.log , compaction ts7 is requested, hasn't completed, so the base file to fetch positions for updates/deletes is fg1_ts1.parquet . After compaction happened generating fg1_ts7.parquet and the .fg1_ts6.log has completion time ts8 , fg1_ts7.parquet and .fg1_ts6.log belong to the latest file slice now based on the completion, but the positions in .fg1_ts6.log cannot be used for merging against fg1_ts7.parquet . Note that there is no issue with positions for .fg1_ts2.log and .fg1_ts4.log since the base file attached to the file slice does not change over time so the positions can still be used for merging with correctness.
fg1_ts1.parquet (fg1_ts7.parquet)
from compaction
.fg1_ts2.log .fg1_ts4.log .fg1_ts6.log
(completion time ts3) (completion time ts5) (completion time ts8)
written before fg1_ts5.parquet is generated
Proposal
We should always write positions, and let the merger to decide whether to use positional merging for correctness.
Design Option 1
Add the base instant time for the positions generated against to the log block header.
When doing merging, if the base instant time for positions does not match the base file instant time, do not use positions for merging the records in this log block. This is simple and straightforward and can avoid any confusion if file slicing, particularly the base file, changed for a log file and block.
Design Option 2
No new metadata. Rely on the relationship between base instant time, log file instant time, and completion time to determine the base instant time for the positions on the fly and whether to use positional merging.
In this case, we need to determine the base instant time for the positions written in .fg1_ts6.log on the fly. There are two drawbacks:
- the time of writing new base file ( fg1_ts7.parquet ) and log file ( .fg1_ts6.log ) may not indicate the ordering of when these files are written, e.g., fg1_ts7.parquet can still be written before .fg1_ts6.log . So we'll need a slightly complex condition to determine the base instant time for the positions written in the log block, which is error-prone.
- We need to lookup completion time here, potentially reading LSM timeline, which is another overhead.
Problem
When NBCC or async compaction happens, there is a chance that the positions generated at the time of writing the log blocks can be inaccurate for the snapshot read or compaction at a later time due to new file slicing.
The heart of the problem is that when generating the positions, it is based on the current base file available; the snapshot read or compaction at a later time can rebase the log file onto a new base file based on the completion time-based file slicing. If the new base file is generated from the old base file with deletes in log files, the positions will be wrong, and the merging results will also be wrong.
Take the following example, when writing .fg1_ts6.log , compaction ts7 is requested, hasn't completed, so the base file to fetch positions for updates/deletes is fg1_ts1.parquet . After compaction happened generating fg1_ts7.parquet and the .fg1_ts6.log has completion time ts8 , fg1_ts7.parquet and .fg1_ts6.log belong to the latest file slice now based on the completion, but the positions in .fg1_ts6.log cannot be used for merging against fg1_ts7.parquet . Note that there is no issue with positions for .fg1_ts2.log and .fg1_ts4.log since the base file attached to the file slice does not change over time so the positions can still be used for merging with correctness.
Proposal
We should always write positions, and let the merger to decide whether to use positional merging for correctness.
Design Option 1
Add the base instant time for the positions generated against to the log block header.
When doing merging, if the base instant time for positions does not match the base file instant time, do not use positions for merging the records in this log block. This is simple and straightforward and can avoid any confusion if file slicing, particularly the base file, changed for a log file and block.
Design Option 2
No new metadata. Rely on the relationship between base instant time, log file instant time, and completion time to determine the base instant time for the positions on the fly and whether to use positional merging.
In this case, we need to determine the base instant time for the positions written in .fg1_ts6.log on the fly. There are two drawbacks: