Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
-
None
Description
Memstore flushes are triggered today if a memstore exceeds a certain size or there is memory pressure. However, there is no timer based flush for a memstore. This means a single column family or table getting a very slow rate of writes could hold up old HLogs from getting reclaimed for long periods of time-- which in turn increases recovery time for a failed region server since there are a lot more logs to process.
META is an example of a table which is likely to get very few writes. But even if we special cased META somehow, it wouldn't be good enough, since an application could genuinely have a mix of slow and fast changing tables or column families.
What about also triggering flushes on a timer (in addition to the current mechanism) to bound recovery times?