PHOENIX-2883, I've been trying to figure out how to track down the root cause of an issue we were seeing where a negative memstoreSize was ultimately causing an RS to abort. The tl;dr version is
- Something causes memstoreSize to be negative (not sure what is doing this yet)
- All subsequent flushes short-circuit and don't run because they think there is no data to flush
- The region is eventually closed (commonly, for a move).
- A final flush is attempted on each store before closing (which also short-circuit for the same reason), leaving unflushed data in each store.
- The sanity check that each store's size is zero fails and the RS aborts.
I have a little patch which I think should improve our failure case around this, preventing the RS abort safely (forcing a flush when memstoreSize is negative) and logging a calltrace when an update to memstoreSize make it negative (to find culprits in the future).