During region close, there are two flushes to ensure nothing is persisted in memory. When there is data in current memstore only, 1 flush is required. When there is data also in memstore's snapshot, 2 flushes are essential otherwise we have data loss. However, recently we found two bugs that lead to at least 1 flush skipped and caused data loss.
Bug 1: Wrong calculation of HRegion.memstoreSize
When a flush fails, data to be flushed is kept in each MemStore's snapshot and wait for next flush attempt to continue on it. But when the next flush succeeds, the counter of total memstore size in HRegion is always deduced by the sum of current memstore sizes instead of snapshots left from previous failed flush. This calculation is problematic that almost every time there is failed flush, HRegion.memstoreSize gets reduced by a wrong value. If region flush could not proceed for a couple cycles, the size in current memstore could be much larger than the snapshot. It's likely to drift memstoreSize much smaller than expected. In extreme case, if the error accumulates to even bigger than HRegion's memstore size limit, any further flush is skipped because flush does not do anything if memstoreSize is not larger than 0.
When the region is closing, if the two flushes get skipped and leave data in current memstore and/or snapshot, we could lose data up to the memstore size limit of the region.
The fix is deducing correct size of data that is going to be flushed from memstoreSize.
Bug 2: Conditions for the first flush of region close (so-called pre-flush)
If memstoreSize is smaller than a certain value, or when region close starts a flush is ongoing, the first flush is skipped and only the second flush takes place. However, two flushes are required in case previous flush fails and leaves some data in snapshot. The bug could cause loss of data in current memstore.
The fix is removing all conditions except abort check so we ensure 2 flushes for region close.