HBASE-28004, we added a "cached time" long at the end of each block on the bucket cache. We also record the cached time in the backing map we persist to disk periodically, in order to retrieve the cache upon crashes/restarts. The persisted backing map includes the last modification time of the cache itself.
On restarts, once we read the backing map from the persisted file, we compare the last modification time of the cache recorded there against the last modification time of the cache. If those differ, it means the cache has been updated after the backing map has been persisted, so the backing map might not be accurate. We then iterate though the backing map entires and compare the entries cached time against the related block in the cache, and if those differ, we remove the entry from the map.
Currently this validation is made at RS initialisation time, but with caches as large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is useless over that time. This PR changes this validation to be performed in the background, whilst direct accesses to a block in the cache would also perform the "cached time" comparison.
This PR also moves the "cached time" to the beginning of the block in the cache, instead of the end. We noticed that with the "cached time" at the end we can fail to ensure consistency at some conditions. Consider the following:
1) A block B1 of size S gets allocated at offset 0 with cached time T1;
2) The backing map is persisted, containing B1 at offset 0 and cached time T1;
3) B1 is evicted. It's offset in the cache is now free, however its contents are still there, including the cached time T1 at its end;
4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2;
5) RS crashes before the backing map gets saved, so the persisted backing map still has only the reference to B1, but not B2;
6) At restart, we run the validation. Because B2 was half the size of B1, we haven't overridden B1 cached time from the cache, so we will successfully validate B1, although its content is now half overridden by B2.